Skip to content

feat: company brain — typed records, digest, intake, notify, readable captures#356

Merged
plind-junior merged 14 commits into
testfrom
feat/company-brain
Jul 4, 2026
Merged

feat: company brain — typed records, digest, intake, notify, readable captures#356
plind-junior merged 14 commits into
testfrom
feat/company-brain

Conversation

@plind-junior

@plind-junior plind-junior commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

turns a vouch kb into a team memory that stays behind the review gate. every addition is a viewport or convention over machinery that already exists: records are config-declared page kinds, structured queries are filters over list_pages, proactivity is a one-shot cli in operator cron, and the llm stays host-side in the adapters. nothing new can reach proposals.approve() — the one gate change in here (protected kinds) tightens it.

what's in the branch:

  • company-brain init template: contact, org, project-record, meeting-notes, followup, decision-record, and voice declared as ordinary page_kinds config, plus a cited guide page. merging into config.yaml is additive and idempotent; operator-declared kinds always win. vouch init --template company-brain.
  • vouch init --template wiring for the onboarding template registry (templates layer on top of the starter seed).
  • five slash commands in the claude-code adapter, mirrored to the plugin skills array: /vouch-ask, /vouch-remember, /vouch-record, /vouch-followup, /vouch-standup. every flow terminates at kb_propose_*; none may call kb_approve.
  • frontmatter filters on kb.list_pages across mcp/jsonl/cli — kind equality, field equality, and inclusive bounds that order numbers and iso dates — plus the vouch pages human mirror. deliberately not a query language.
  • kb.digest / vouch digest: read-only reviewer briefing (pending oldest-first, recent decisions, stale claims, followups due, citation coverage), registered at all four sites. writes nothing; safe to run from cron.
  • vouch source fetch <url>: snapshot a url's exact bytes as a content-addressed source so claims cite immutable evidence. http/https only, every redirect hop re-validated, private-network hosts refused, 2 mib cap, fetched_at recorded in source metadata.
  • vouch inbox --dir <path> [--watch]: each dropped .md/.txt becomes a registered source plus one pending page proposal citing it — the capture.finalize shape with a file drop as the trigger. content-hash seen-state makes re-runs idempotent; the watch loop is a bounded stdlib poll, no daemon.
  • vouch notify sweep|test: config-declared reviewer webhooks (proposal.created, queue.backlogged, proposal.aged) with optional hmac-sha256 signing and env: secret refs. read-and-notify only, idempotent per (event, proposal), best-effort delivery that can never wedge a sweep.
  • protected page kinds: page_kinds.<kind>.protected: true exempts a kind from the trusted-agent self-approval opt-out, so policy-bearing pages (voice, decision-record in the template) always need a reviewer other than the proposer.
  • fix(page-kinds): string-typed frontmatter schema fields accept yaml's native date scalars. a bare due_at: 2026-07-01 loads as a date object from cli --meta parsing and from every frontmatter disk round-trip, so pages that validated at propose used to fail re-validation at approve.
  • feat(capture): auto-captured session proposals are titled from the session's first user prompt, extracted mechanically from the transcript the sessionend hook already passes (no llm) — instead of session summary: <project> (<uuid>). files/date fallback when no transcript survives; the uuid stays in the body.

testing: full gate green — 871 passed, 14 skipped, mypy clean, ruff clean. new suites: test_page_filters, test_digest, test_fetch, test_inbox, test_notify, test_brain_commands (including ast guards asserting the intake modules have no import or attribute path to approve()), plus extended onboarding, page-kinds, capture, and install-adapter tests. also exercised end-to-end through the real cli: init --template, propose a followup, approve, filter pages, run the digest.

docs: docs/company-brain.md covers the record conventions, the intake channels, webhooks, cron recipes, and the honest constraint that review throughput is the bottleneck by design. changelog updated under [Unreleased].

Summary by CodeRabbit

  • New Features

    • Added a read-only digest view for reviewing pending work, decisions, stale items, and followups.
    • Added page listing filters, inbox folder ingestion, source fetching, webhook notifications, and a transcript-aware session summary.
    • Added new Claude Code slash commands for asking, remembering, recording, followups, standups, and onboarding.
  • Bug Fixes

    • Improved page filtering and frontmatter handling.
    • Preserved date/datetime values in string fields across save and reload.
    • Added protected page kinds to prevent self-approval where required.

declares record kinds (contact, org, project-record, meeting-notes,
followup, decision-record, voice) as ordinary page_kinds config so both
gates validate them, and seeds a cited, approved guide page describing
the conventions. merging into config.yaml is additive: operator-declared
kinds win, and when the file has no page_kinds key the block is appended
textually so comments survive. idempotent via stable artifact ids.
…stry

templates layer on top of the starter seed instead of replacing it, so
an init'd kb always carries the walkthrough basics. click.Choice keeps
the flag self-documenting and rejects unknown names before any i/o.
the natural-language layer for a team-memory kb lives host-side as
prompts: ask answers only with citations, remember registers the
user's words as a source then proposes claims citing it, record files
entity + typed-page pairs, followup files dated commitments. every
flow terminates at kb_propose_* — none may call kb_approve. registered
in the claude-code install manifest (T3) and the plugin skills array.
typed record kinds put their structure in page frontmatter; this adds
one shared filter vocabulary over it — kind equality, field equality,
and inclusive ordered bounds (numbers, iso dates) — as a viewport over
store.list_pages(). deliberately not a query language: anything richer
belongs in the caller. same kb.list_pages method, new optional params,
plus a human mirror: vouch pages --kind followup --before due_at=...
…e followups

an operator returning to a kb reconstructs "what needs me" from several
commands; vouch digest folds them into one glance — pending proposals
oldest-first, decisions in the window (from the decided/ records),
stale claims behind the count metrics already reports, and open
followup pages whose due_at has arrived. strictly a viewport: composes
list_proposals, list_claims, list_pages and metrics.compute, writes
nothing, logs no audit event. registered at all four surfaces as
kb.digest plus a /vouch-standup slash command that narrates it.
yaml parses a bare due_at: 2026-07-01 as a date object — from the cli
--meta parser and from every frontmatter disk round-trip — so a
type: string schema field either rejects the natural spelling at
propose time or, worse, fails re-validation at approve time for a page
that validated when proposed. treat date/datetime as string-compatible
scalars; genuinely wrong types still fail.
one page covering the template, the record conventions, frontmatter
queries, the digest cron loop, and the slash commands — plus the honest
constraint that the review queue is the bottleneck by design.
evidence intake for web content: fetch the exact bytes once, register
them content-addressed, and let claims cite the immutable snapshot id
so the evidence a reviewer approved against survives the live page
drifting. conservative defaults for the first outbound network call in
the intake path: http/https only, every redirect hop re-validated,
hosts must resolve to public addresses, 2mib body cap, raw bytes only.
fetched_at and final_url recorded in source metadata; audit event
source.fetch. never imports proposals — sources are evidence, not
knowledge.
each new .md/.txt file in the folder is registered as a
content-addressed source and rolled into exactly one pending page
proposal citing it — the capture.finalize shape with a dropped file as
the trigger instead of a hook payload. content-hash seen-state sidecar
makes re-runs idempotent; edited files re-propose. --watch is a
bounded stdlib poll, no daemon. reads go through read_under_root, and
an ast test pins that the module never imports an approve path.
the gate only works if a human notices it has work. vouch notify sweep
evaluates the pending queue against config-declared webhooks — new
proposals, a backlog crossing its threshold, proposals aged past a
cutoff — and posts a small json envelope (optionally hmac-signed,
secrets via the env: convention serve.bearer_token uses). delivery is
best-effort and idempotent per (event, proposal); state lives in the
derived state.db so losing it can only re-notify. read-and-notify
only: no path here proposes, approves, or edits.

protected page kinds tighten the gate the one direction invariants
welcome: a kind with protected: true is exempt from the trusted-agent
self-approval opt-out, so policy-bearing pages (voice, decision
records — both marked in the company-brain template) always need a
reviewer other than the proposer.
the nl layer lives host-side as prompt files, so the strongest
deterministic guarantee is textual: every brain command must keep its
explicit never-approve instruction, every manifest skills path must
exist, and an ast walk pins that fetch/inbox/notify have no import or
attribute path to approve(). plus docs and changelog for the intake
and notification features.
the queue used to show 27 proposals all titled "session summary:
<project> (<uuid>)" — unreviewable at a glance. the sessionend hook
payload already carries transcript_path, so finalize now lifts the
human's first genuine prompt from the transcript (pure extraction —
host wrapper messages skipped, no model involved, capture's no-llm
rule holds) and titles the proposal "session: <prompt excerpt>", with
the prompt quoted in a new body section. without a transcript the
title falls back to what changed: "session <date>: web, docs — 9
file(s)". the uuid stays in the body for traceability.
@plind-junior plind-junior changed the base branch from main to test July 4, 2026 03:59
@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 513fec3c-759d-481d-adb6-c80784d85f5a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds a "company brain" knowledge workflow: an onboarding template with typed/protected page kinds, self-approval gating, frontmatter-based page filtering, a read-only digest command, content-addressed URL/source fetch, inbox file ingestion, HMAC-signed webhook notifications, transcript-derived capture titles, new Claude slash commands, and supporting docs and tests.

Changes

Company Brain Feature Set

Layer / File(s) Summary
Onboarding template and vouch init --template
src/vouch/onboarding.py, src/vouch/cli.py, CHANGELOG.md, tests/test_onboarding.py
Adds BRAIN_PAGE_KINDS/seed_company_brain_kb with idempotent config.yaml merging and wires --template into vouch init.
Slash commands, manifest, and docs
adapters/claude-code/.claude/commands/vouch-*.md, adapters/claude-code/install.yaml, openclaw.plugin.json, docs/company-brain.md, tests/test_brain_commands.py, tests/test_install_adapter.py
Adds five new Claude command docs, wires them into the install manifest and plugin skills list, and adds the company-brain workflow doc plus guardrail tests.
Protected page kinds and self-approval gate
src/vouch/page_kinds.py, src/vouch/proposals.py, src/vouch/cli.py, tests/test_page_kinds.py
Adds a protected flag/is_protected, YAML date/datetime schema acceptance, and blocks self-approval for protected page kinds.
Frontmatter filters and kb.digest
src/vouch/page_filters.py, src/vouch/digest.py, src/vouch/server.py, src/vouch/jsonl_server.py, src/vouch/cli.py, src/vouch/capabilities.py, tests/test_page_filters.py, tests/test_digest.py
Adds filter_pages/parse_kv for equality/bounds filtering used by kb_list_pages/vouch pages, and a digest.build/render pipeline exposed via vouch digest and kb.digest.
Content-addressed URL fetch
src/vouch/fetch.py, src/vouch/cli.py, tests/test_fetch.py
Adds fetch_url/snapshot_url with scheme/host/redirect/size safety guards and wires vouch source fetch.
Inbox ingestion
src/vouch/inbox.py, src/vouch/cli.py, tests/test_inbox.py
Adds idempotent inbox scan/watch that files pending proposals from dropped files and wires vouch inbox.
Webhook notifications
src/vouch/notify.py, src/vouch/cli.py, tests/test_notify.py
Adds load_webhooks/deliver/sweep/send_test with HMAC signing and idempotence state, wired via `vouch notify sweep
Capture transcript titles
src/vouch/capture.py, src/vouch/cli.py, tests/test_capture.py
Extracts the first user prompt from a transcript to drive session summary titles in build_summary_body/finalize.

Estimated code review effort: 4 (Complex) | ~75 minutes

Sequence Diagram(s)

sequenceDiagram
  participant CLI as vouch inbox
  participant Inbox as inbox.scan
  participant Store as KBStore
  CLI->>Inbox: scan(store, directory, config)
  Inbox->>Store: read_under_root, put_source
  Inbox->>Store: propose_page (if new/changed)
  Store-->>Inbox: proposal id
  Inbox-->>CLI: ScanResult(proposed, skipped)
Loading
sequenceDiagram
  participant CLI as vouch notify sweep
  participant Notify as notify.sweep
  participant Store as KBStore
  participant Webhook as Webhook endpoint
  CLI->>Notify: sweep(store, now)
  Notify->>Store: load_webhooks, list pending proposals
  Notify->>Webhook: deliver(envelope, secret)
  Webhook-->>Notify: HTTP status
  Notify->>Store: save idempotence state
  Notify-->>CLI: fired event names
Loading

Possibly related issues

Possibly related PRs

  • vouchdev/vouch#111: Both modify the approval/review gating path in src/vouch/proposals.py.
  • vouchdev/vouch#151: Both work on the Claude Code adapter's tiered slash-command install manifest.
  • vouchdev/vouch#303: Both touch src/vouch/capture.py and the finalize/capture flow directly.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main company-brain onboarding, KB digest, intake, notify, and capture changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/company-brain

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions github-actions Bot added docs documentation, specs, examples, and repo guidance cli command line interface adapters agent host adapters and install manifests openclaw openclaw integration mcp mcp, jsonl, and http surfaces storage kb storage, migrations, schemas, and proposals tests tests and fixtures size: XL 1000 or more changed non-doc lines labels Jul 4, 2026
@plind-junior plind-junior changed the title Feat/company brain feat: company brain — typed records, digest, intake, notify, readable captures Jul 4, 2026
resolve openclaw.plugin.json to the 2026.6 loader dialect: id/kind/
version plus a single json-schema configSchema, skills as directories.
publish the five company-brain commands as
adapters/openclaw/skills/<name>/SKILL.md and extend the manifest test's
skill list to all nine. bump package.json to 1.1.0 so the version
lockstep test holds against pyproject. repoint the decision-log
screenshot shots at claims that exist in the reworked example fixture
and re-render the svgs.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (3)
tests/test_brain_commands.py (1)

44-55: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low value

AST guard covers attribute-style calls but not getattr-based indirection.

The check catches from X import approve and .approve attribute access, but a call like getattr(life, "approve")(...) would bypass both checks. Given this is a defense-in-depth test rather than the sole safeguard (the real gate is _approval_block_reason in proposals.py), this is a low-priority gap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_brain_commands.py` around lines 44 - 55, The AST guard in
test_intake_modules_have_no_approve_import only checks ImportFrom names and
direct .approve attributes, so it can miss getattr-based access. Update the test
to also scan for getattr(...) calls that use the string "approve" (and any
similar indirection) in the same module walk, using the existing
test_intake_modules_have_no_approve_import logic as the place to extend the
safeguard.
src/vouch/cli.py (1)

1435-1447: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Accessing a private helper across module boundary.

notify_mod._resolve_env is underscore-prefixed (private to notify.py) yet called directly from cli.py. Consider exposing a public resolve_env/resolve_secret helper for cross-module use, matching how other notify_mod/fetch_mod functions are consumed as public API elsewhere in this file.

♻️ Suggested fix
-def _resolve_env(raw: str, *, what: str) -> str:
+def resolve_env(raw: str, *, what: str) -> str:

And update the call site to notify_mod.resolve_env(...).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/cli.py` around lines 1435 - 1447, The notify_test command in cli.py
is calling the private underscore-prefixed helper notify_mod._resolve_env across
a module boundary. Expose a public helper in notify.py (for example resolve_env
or resolve_secret) and update notify_test to call that public API instead,
following the same pattern used by other notify_mod/fetch_mod functions in this
file. Ensure the new public helper preserves the current behavior for resolving
--secret before send_test is invoked.
src/vouch/digest.py (1)

158-216: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Stale-claim logic is duplicated between digest.build() and metrics.compute().

stale_claims (rows) is derived from a bespoke loop here, while stale_total is pulled from a separate metrics.compute() call. The comment at Line 158-159 states they intend the same anchor/threshold semantics, but the criteria (retired-status set, anchor precedence, > vs >= boundary) live in two places. If they ever drift, the digest will show a stale_total count that doesn't match the number of stale_claims rows returned (independent of limit truncation), undermining the "read-only reviewer briefing" contract this module exists for.

Consider extracting the staleness predicate/anchor computation into a single shared helper in metrics.py that both compute() and digest.build() call.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/digest.py` around lines 158 - 216, The stale-claim selection logic
is duplicated between digest.build() and metrics.compute(), so the row list and
stale_total can drift. Extract the shared staleness predicate and anchor
calculation into a helper in metrics.py, and have both build() and compute() use
that helper for retired-status filtering, anchor precedence, and the freshness
boundary. Keep the existing limit handling only in digest.build(), but make the
stale_claims rows and stale_total derive from the same shared logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/company-brain.md`:
- Around line 21-29: The `followup` entry in the company-brain table is missing
a required-field marker for `due_at`, while `followup_status` is already marked
required. Update the `followup` row in the table to show both `due_at` and
`followup_status` as required, keeping the wording aligned with the existing
`kind`/`frontmatter`/`rule` structure so readers see the full requirement set.

In `@src/vouch/cli.py`:
- Around line 1436-1437: The `notify test` CLI option `--secret` currently
allows a literal secret to be passed directly, which should be avoided. Update
the `notify_test` command in `src/vouch/cli.py` so `--secret` follows the same
`env:VAR` convention used by the webhook config path, or otherwise make the help
text for `--secret` explicitly warn that raw values are exposed in shell history
and process listings. Make sure the handling around the `--url`/`--secret`
options and the `notify_test` flow consistently enforces or documents this
behavior.

In `@src/vouch/fetch.py`:
- Around line 116-126: The non-redirect HTTPError path in the fetch loop leaks
the underlying response because only the redirect branch closes e; update the
exception handling in the fetch logic around _opener.open in src/vouch/fetch.py
so every caught urllib.error.HTTPError is closed before raising FetchError. Keep
the existing redirect handling in place, but ensure the 4xx/5xx branch also
calls e.close() (or otherwise closes the response) before re-raising, using the
current fetch loop and current/target flow as the location to fix.
- Around line 64-138: The fetch path in fetch_url/_check_url still relies on
urllib to resolve the host again at connection time, which leaves a DNS
rebinding gap after the initial public-IP validation. Update the transport used
by fetch_url so it connects to the vetted IP address returned by _check_url,
while still sending the original hostname for Host/SNI, and keep redirect
re-validation intact. Use the existing symbols _check_url, fetch_url, and
_opener as the main places to wire in the pinned-transport behavior.

In `@src/vouch/inbox.py`:
- Around line 94-142: Persist inbox scan state incrementally inside scan()
instead of only at the end, so successfully processed files are remembered even
if a later put_source or propose_page call fails. Update the state immediately
after a file’s source/proposal is accepted in the loop, using _save_state
alongside _load_state and the existing state[str(resolved)] tracking, while
keeping ScanResult behavior unchanged. This keeps scan() idempotent across
partial failures and preserves deduplication for already-processed files.
- Around line 145-162: The watch() poll loop in inbox.py should not die on a
single scan() failure; add per-iteration exception handling around the
scan(store, directory) call inside watch(). Catch transient/file-specific
exceptions, log or report them with enough context, then continue to the next
tick so the background poller keeps running; use the existing watch() and scan()
symbols to place the fix.

In `@src/vouch/jsonl_server.py`:
- Around line 251-259: The kb.list_pages JSONL handler in _h_list_pages is
returning the full Page model instead of matching the MCP surface contract used
by kb_list_pages. Update the return shape to a curated dict with only the same
fields as server.py’s kb_list_pages (for example id, title, type, tags,
metadata) and avoid exposing the full page body/content through
_store().list_pages() results.
- Around line 101-110: The _h_digest helper is letting explicit null inputs
bypass its defaults, so handle None values before parsing in this function. In
_h_digest, normalize p["since"], p["stale_days"], and p["limit"] so that null is
treated the same as a missing key before passing values to
metrics_mod.parse_since, int(...), and digest_mod.build. Keep the existing
default constants from digest_mod and metrics_mod, but ensure the fallback is
applied whenever the provided value is None.

In `@src/vouch/notify.py`:
- Around line 189-280: The sweep() logic in notify.py treats EVENT_CREATED as a
single global state, so one webhook’s success can clear retry eligibility for
another webhook that failed. Update sweep() and its state handling so
created/aged delivery tracking is per-hook (for example keyed by hook.url)
instead of shared across all hooks, and only mark proposal ids delivered for the
specific hook after that hook’s deliver() succeeds. Preserve the existing
idempotent behavior, but make _load_state/_save_state and the created/aged state
updates reflect each webhook’s independent delivery outcome.
- Around line 84-121: The numeric webhook config parsing in load_webhooks is not
handled consistently: converting backlog_threshold directly with int(...) can
raise a raw ValueError. Update the webhook construction logic in load_webhooks
so invalid backlog_threshold values are caught and re-raised as
NotifyConfigError with a clear message, matching the existing validation style
used for notify.webhooks entries and _resolve_env fields.
- Around line 124-148: The deliver() webhook sender currently forwards hook_url
directly into urllib.request.urlopen(), which can allow गैर-http schemes like
file://; add the same scheme allowlist used in fetch.py before building the
Request or calling urlopen so only http and https URLs are accepted. Update
deliver() to validate hook_url early, log-and-return False on disallowed
schemes, and keep the rest of the POST flow unchanged.

In `@src/vouch/onboarding.py`:
- Around line 429-466: The _merge_page_kinds helper in onboarding.py is treating
a present-but-invalid page_kinds value the same as a missing key, which can
silently append a second top-level block and shadow the original config. Update
the page_kinds handling so that only a missing key triggers the append/merge
path, and any existing non-dict value raises a ValueError instead. Keep the fix
inside _merge_page_kinds, using the existing loaded/existing checks and
preserving the current merge behavior for valid mappings.

---

Nitpick comments:
In `@src/vouch/cli.py`:
- Around line 1435-1447: The notify_test command in cli.py is calling the
private underscore-prefixed helper notify_mod._resolve_env across a module
boundary. Expose a public helper in notify.py (for example resolve_env or
resolve_secret) and update notify_test to call that public API instead,
following the same pattern used by other notify_mod/fetch_mod functions in this
file. Ensure the new public helper preserves the current behavior for resolving
--secret before send_test is invoked.

In `@src/vouch/digest.py`:
- Around line 158-216: The stale-claim selection logic is duplicated between
digest.build() and metrics.compute(), so the row list and stale_total can drift.
Extract the shared staleness predicate and anchor calculation into a helper in
metrics.py, and have both build() and compute() use that helper for
retired-status filtering, anchor precedence, and the freshness boundary. Keep
the existing limit handling only in digest.build(), but make the stale_claims
rows and stale_total derive from the same shared logic.

In `@tests/test_brain_commands.py`:
- Around line 44-55: The AST guard in test_intake_modules_have_no_approve_import
only checks ImportFrom names and direct .approve attributes, so it can miss
getattr-based access. Update the test to also scan for getattr(...) calls that
use the string "approve" (and any similar indirection) in the same module walk,
using the existing test_intake_modules_have_no_approve_import logic as the place
to extend the safeguard.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 815b744b-9807-4085-b6fc-ad2292748f0a

📥 Commits

Reviewing files that changed from the base of the PR and between d80b1e2 and 1112fc9.

📒 Files selected for processing (32)
  • CHANGELOG.md
  • adapters/claude-code/.claude/commands/vouch-ask.md
  • adapters/claude-code/.claude/commands/vouch-followup.md
  • adapters/claude-code/.claude/commands/vouch-record.md
  • adapters/claude-code/.claude/commands/vouch-remember.md
  • adapters/claude-code/.claude/commands/vouch-standup.md
  • adapters/claude-code/install.yaml
  • docs/company-brain.md
  • openclaw.plugin.json
  • src/vouch/capabilities.py
  • src/vouch/capture.py
  • src/vouch/cli.py
  • src/vouch/digest.py
  • src/vouch/fetch.py
  • src/vouch/inbox.py
  • src/vouch/jsonl_server.py
  • src/vouch/notify.py
  • src/vouch/onboarding.py
  • src/vouch/page_filters.py
  • src/vouch/page_kinds.py
  • src/vouch/proposals.py
  • src/vouch/server.py
  • tests/test_brain_commands.py
  • tests/test_capture.py
  • tests/test_digest.py
  • tests/test_fetch.py
  • tests/test_inbox.py
  • tests/test_install_adapter.py
  • tests/test_notify.py
  • tests/test_onboarding.py
  • tests/test_page_filters.py
  • tests/test_page_kinds.py

Comment thread docs/company-brain.md
Comment on lines +21 to +29
| kind | frontmatter | rule |
|---|---|---|
| `contact` | `role` (required), `org`, `email` | a person you work with |
| `org` | `website` | a company or organisation |
| `project-record` | `record_status` (required), `owner` | one project's living record |
| `meeting-notes` | `date`, `attendees` | notes from one meeting |
| `followup` | `due_at`, `followup_status` (required), `owner` | a dated commitment |
| `decision-record` | — | must cite claims/sources |
| `voice` | — | must cite the examples it distills |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

followup row understates required fields.

The table marks only followup_status as (required) for followup, but the onboarding test confirms both fields are required: assert "due_at" in required" and "assert "followup_status" in required (per tests/test_onboarding.py). Update the table to mark due_at as required too, or readers will assume it's optional.

📝 Proposed fix
-| `followup` | `due_at`, `followup_status` (required), `owner` | a dated commitment |
+| `followup` | `due_at` (required), `followup_status` (required), `owner` | a dated commitment |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| kind | frontmatter | rule |
|---|---|---|
| `contact` | `role` (required), `org`, `email` | a person you work with |
| `org` | `website` | a company or organisation |
| `project-record` | `record_status` (required), `owner` | one project's living record |
| `meeting-notes` | `date`, `attendees` | notes from one meeting |
| `followup` | `due_at`, `followup_status` (required), `owner` | a dated commitment |
| `decision-record` || must cite claims/sources |
| `voice` || must cite the examples it distills |
| kind | frontmatter | rule |
|---|---|---|
| `contact` | `role` (required), `org`, `email` | a person you work with |
| `org` | `website` | a company or organisation |
| `project-record` | `record_status` (required), `owner` | one project's living record |
| `meeting-notes` | `date`, `attendees` | notes from one meeting |
| `followup` | `due_at` (required), `followup_status` (required), `owner` | a dated commitment |
| `decision-record` || must cite claims/sources |
| `voice` || must cite the examples it distills |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/company-brain.md` around lines 21 - 29, The `followup` entry in the
company-brain table is missing a required-field marker for `due_at`, while
`followup_status` is already marked required. Update the `followup` row in the
table to show both `due_at` and `followup_status` as required, keeping the
wording aligned with the existing `kind`/`frontmatter`/`rule` structure so
readers see the full requirement set.

Comment thread src/vouch/cli.py
Comment on lines +1436 to +1437
@click.option("--url", required=True)
@click.option("--secret", default=None, help="Optional hmac secret (or env:VAR).")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win

--secret accepts raw secrets on the CLI.

Unlike webhook config secrets (which support env:VAR), notify test --secret lets a caller pass the literal secret value as a CLI argument, which lands in shell history and is visible via ps. Consider requiring env:VAR for this flag (mirroring the config-side convention) or documenting the risk in --help.

Also applies to: 1441-1444

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/cli.py` around lines 1436 - 1437, The `notify test` CLI option
`--secret` currently allows a literal secret to be passed directly, which should
be avoided. Update the `notify_test` command in `src/vouch/cli.py` so `--secret`
follows the same `env:VAR` convention used by the webhook config path, or
otherwise make the help text for `--secret` explicitly warn that raw values are
exposed in shell history and process listings. Make sure the handling around the
`--url`/`--secret` options and the `notify_test` flow consistently enforces or
documents this behavior.

Comment thread src/vouch/fetch.py
Comment on lines +64 to +138
def _addr_is_public(address: str) -> bool:
try:
ip = ipaddress.ip_address(address)
except ValueError:
return False
return not (
ip.is_private
or ip.is_loopback
or ip.is_link_local
or ip.is_multicast
or ip.is_reserved
or ip.is_unspecified
)


def _check_url(url: str, *, allow_private: bool) -> None:
parsed = urllib.parse.urlparse(url)
if parsed.scheme not in ("http", "https"):
raise FetchError(f"only http/https URLs can be snapshotted, got {url!r}")
host = parsed.hostname
if not host:
raise FetchError(f"URL has no host: {url!r}")
if allow_private:
return
try:
infos = socket.getaddrinfo(host, parsed.port or 0, proto=socket.IPPROTO_TCP)
except OSError as e:
raise FetchError(f"cannot resolve {host!r}: {e}") from e
for info in infos:
address = str(info[4][0])
if not _addr_is_public(address):
raise FetchError(
f"{host!r} resolves to non-public address {address} — refusing to fetch"
)


def fetch_url(
url: str,
*,
max_bytes: int = DEFAULT_MAX_BYTES,
timeout: float = DEFAULT_TIMEOUT,
allow_private: bool = False,
) -> FetchResult:
"""Fetch `url` with redirect re-validation and a byte cap.

allow_private skips the public-address check — used by tests that run a
loopback fixture server; production callers leave it False.
"""
current = url
for _ in range(MAX_REDIRECTS + 1):
_check_url(current, allow_private=allow_private)
req = urllib.request.Request(current, headers={"User-Agent": _USER_AGENT})
try:
resp = _opener.open(req, timeout=timeout)
except urllib.error.HTTPError as e:
if e.code in (301, 302, 303, 307, 308):
target = e.headers.get("Location")
e.close()
if not target:
raise FetchError(f"redirect from {current!r} without Location") from e
current = urllib.parse.urljoin(current, target)
continue
raise FetchError(f"HTTP {e.code} fetching {current!r}") from e
except OSError as e:
raise FetchError(f"fetch failed for {current!r}: {e}") from e
with resp:
body = resp.read(max_bytes + 1)
if len(body) > max_bytes:
raise FetchError(
f"{current!r} exceeds the {max_bytes}-byte snapshot cap"
)
content_type = resp.headers.get("Content-Type", "application/octet-stream")
media_type = content_type.split(";")[0].strip() or "application/octet-stream"
return FetchResult(content=body, final_url=current, media_type=media_type)
raise FetchError(f"too many redirects fetching {url!r}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n "fetch|source|trust" openclaw.plugin.json
rg -n "source_fetch|fetch_mod\." src/vouch/cli.py

Repository: vouchdev/vouch

Length of output: 389


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== openclaw.plugin.json =="
ast-grep outline openclaw.plugin.json --view expanded || true
nl -ba openclaw.plugin.json | sed -n '50,90p'

echo
echo "== src/vouch/fetch.py =="
ast-grep outline src/vouch/fetch.py --view expanded || true
nl -ba src/vouch/fetch.py | sed -n '1,220p'

echo
echo "== src/vouch/cli.py =="
ast-grep outline src/vouch/cli.py --view expanded || true
nl -ba src/vouch/cli.py | sed -n '1310,1360p'

Repository: vouchdev/vouch

Length of output: 254


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== openclaw.plugin.json =="
sed -n '1,220p' openclaw.plugin.json

echo
echo "== src/vouch/fetch.py (relevant range) =="
cat -n src/vouch/fetch.py | sed -n '1,220p'

echo
echo "== src/vouch/cli.py (relevant range) =="
cat -n src/vouch/cli.py | sed -n '1310,1365p'

Repository: vouchdev/vouch

Length of output: 11442


Pin the fetch transport to the vetted IP to close the DNS rebinding window. _check_url validates one resolution, but urllib can resolve again when opening the socket, so a hostile DNS change can still send the request to a private/internal address. Use a transport that connects to the already-approved IP while preserving Host/SNI, rather than relying on a second lookup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/fetch.py` around lines 64 - 138, The fetch path in
fetch_url/_check_url still relies on urllib to resolve the host again at
connection time, which leaves a DNS rebinding gap after the initial public-IP
validation. Update the transport used by fetch_url so it connects to the vetted
IP address returned by _check_url, while still sending the original hostname for
Host/SNI, and keep redirect re-validation intact. Use the existing symbols
_check_url, fetch_url, and _opener as the main places to wire in the
pinned-transport behavior.

Source: Coding guidelines

Comment thread src/vouch/fetch.py
Comment on lines +116 to +126
try:
resp = _opener.open(req, timeout=timeout)
except urllib.error.HTTPError as e:
if e.code in (301, 302, 303, 307, 308):
target = e.headers.get("Location")
e.close()
if not target:
raise FetchError(f"redirect from {current!r} without Location") from e
current = urllib.parse.urljoin(current, target)
continue
raise FetchError(f"HTTP {e.code} fetching {current!r}") from e

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Unclosed HTTPError response on non-redirect failures leaks the socket.

For redirect codes, e.close() is called before continuing; for every other HTTP error (4xx/5xx), e (which wraps the open response) is never closed before raise FetchError(...). Under repeated fetch attempts (e.g. via vouch source fetch retries, inbox/notify flows hitting broken links), this leaks file descriptors/sockets.

🔧 Proposed fix
             raise FetchError(f"HTTP {e.code} fetching {current!r}") from e

becomes:

+            e.close()
             raise FetchError(f"HTTP {e.code} fetching {current!r}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
resp = _opener.open(req, timeout=timeout)
except urllib.error.HTTPError as e:
if e.code in (301, 302, 303, 307, 308):
target = e.headers.get("Location")
e.close()
if not target:
raise FetchError(f"redirect from {current!r} without Location") from e
current = urllib.parse.urljoin(current, target)
continue
raise FetchError(f"HTTP {e.code} fetching {current!r}") from e
try:
resp = _opener.open(req, timeout=timeout)
except urllib.error.HTTPError as e:
if e.code in (301, 302, 303, 307, 308):
target = e.headers.get("Location")
e.close()
if not target:
raise FetchError(f"redirect from {current!r} without Location") from e
current = urllib.parse.urljoin(current, target)
continue
e.close()
raise FetchError(f"HTTP {e.code} fetching {current!r}") from e
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/fetch.py` around lines 116 - 126, The non-redirect HTTPError path
in the fetch loop leaks the underlying response because only the redirect branch
closes e; update the exception handling in the fetch logic around _opener.open
in src/vouch/fetch.py so every caught urllib.error.HTTPError is closed before
raising FetchError. Keep the existing redirect handling in place, but ensure the
4xx/5xx branch also calls e.close() (or otherwise closes the response) before
re-raising, using the current fetch loop and current/target flow as the location
to fix.

Comment thread src/vouch/inbox.py
Comment on lines +94 to +142
def scan(store: KBStore, directory: Path, *, config: InboxConfig | None = None) -> ScanResult:
"""One pass: register + propose every new or changed eligible file."""
cfg = config or load_config(store)
if not cfg.enabled:
return ScanResult()

state = _load_state(store)
proposed: list[str] = []
skipped: list[str] = []

for path in sorted(p for p in directory.iterdir() if p.is_file()):
if path.suffix.lower() not in cfg.extensions:
skipped.append(path.name)
continue
# read through the same containment + O_NOFOLLOW hardening the MCP
# register_source_from_path entrypoint uses
resolved, data = store.read_under_root(path)
text = data.decode("utf-8", errors="replace")
if len(text.strip()) < cfg.min_chars:
skipped.append(path.name)
continue

source = store.put_source(
data,
title=path.name,
locator=str(resolved),
source_type="file",
media_type="text/markdown" if path.suffix.lower() == ".md" else "text/plain",
tags=["inbox"],
)
if state.get(str(resolved)) == source.id:
skipped.append(path.name)
continue

proposal = propose_page(
store,
title=f"inbox: {path.stem}",
body=text,
page_type="log",
source_ids=[source.id],
proposed_by=INBOX_ACTOR,
rationale=f"imported from inbox file {path.name}; distill durable facts into claims",
tags=["inbox"],
)
state[str(resolved)] = source.id
proposed.append(proposal.id)

_save_state(store, state)
return ScanResult(proposed=proposed, skipped=skipped)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

State persisted only once at end of loop — partial failures cause duplicate re-proposals.

state is only written via _save_state after the entire for loop completes (line 141). If propose_page or store.put_source raises partway through (e.g. an unexpected validation error on one file), all files successfully processed earlier in the same scan() call never get their dedup key persisted. The next scan will treat them as new again, re-registering sources and re-proposing pages for content already proposed — breaking the idempotency contract this module's docstring promises and test_scan_seen_state_dedups_and_rescans_changed_files verifies for the happy path.

🛠️ Proposed fix: persist state incrementally
     state = _load_state(store)
     proposed: list[str] = []
     skipped: list[str] = []
 
-    for path in sorted(p for p in directory.iterdir() if p.is_file()):
-        ...
-        state[str(resolved)] = source.id
-        proposed.append(proposal.id)
-
-    _save_state(store, state)
-    return ScanResult(proposed=proposed, skipped=skipped)
+    try:
+        for path in sorted(p for p in directory.iterdir() if p.is_file()):
+            ...
+            state[str(resolved)] = source.id
+            proposed.append(proposal.id)
+    finally:
+        _save_state(store, state)
+    return ScanResult(proposed=proposed, skipped=skipped)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def scan(store: KBStore, directory: Path, *, config: InboxConfig | None = None) -> ScanResult:
"""One pass: register + propose every new or changed eligible file."""
cfg = config or load_config(store)
if not cfg.enabled:
return ScanResult()
state = _load_state(store)
proposed: list[str] = []
skipped: list[str] = []
for path in sorted(p for p in directory.iterdir() if p.is_file()):
if path.suffix.lower() not in cfg.extensions:
skipped.append(path.name)
continue
# read through the same containment + O_NOFOLLOW hardening the MCP
# register_source_from_path entrypoint uses
resolved, data = store.read_under_root(path)
text = data.decode("utf-8", errors="replace")
if len(text.strip()) < cfg.min_chars:
skipped.append(path.name)
continue
source = store.put_source(
data,
title=path.name,
locator=str(resolved),
source_type="file",
media_type="text/markdown" if path.suffix.lower() == ".md" else "text/plain",
tags=["inbox"],
)
if state.get(str(resolved)) == source.id:
skipped.append(path.name)
continue
proposal = propose_page(
store,
title=f"inbox: {path.stem}",
body=text,
page_type="log",
source_ids=[source.id],
proposed_by=INBOX_ACTOR,
rationale=f"imported from inbox file {path.name}; distill durable facts into claims",
tags=["inbox"],
)
state[str(resolved)] = source.id
proposed.append(proposal.id)
_save_state(store, state)
return ScanResult(proposed=proposed, skipped=skipped)
def scan(store: KBStore, directory: Path, *, config: InboxConfig | None = None) -> ScanResult:
"""One pass: register + propose every new or changed eligible file."""
cfg = config or load_config(store)
if not cfg.enabled:
return ScanResult()
state = _load_state(store)
proposed: list[str] = []
skipped: list[str] = []
try:
for path in sorted(p for p in directory.iterdir() if p.is_file()):
if path.suffix.lower() not in cfg.extensions:
skipped.append(path.name)
continue
# read through the same containment + O_NOFOLLOW hardening the MCP
# register_source_from_path entrypoint uses
resolved, data = store.read_under_root(path)
text = data.decode("utf-8", errors="replace")
if len(text.strip()) < cfg.min_chars:
skipped.append(path.name)
continue
source = store.put_source(
data,
title=path.name,
locator=str(resolved),
source_type="file",
media_type="text/markdown" if path.suffix.lower() == ".md" else "text/plain",
tags=["inbox"],
)
if state.get(str(resolved)) == source.id:
skipped.append(path.name)
continue
proposal = propose_page(
store,
title=f"inbox: {path.stem}",
body=text,
page_type="log",
source_ids=[source.id],
proposed_by=INBOX_ACTOR,
rationale=f"imported from inbox file {path.name}; distill durable facts into claims",
tags=["inbox"],
)
state[str(resolved)] = source.id
proposed.append(proposal.id)
finally:
_save_state(store, state)
return ScanResult(proposed=proposed, skipped=skipped)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/inbox.py` around lines 94 - 142, Persist inbox scan state
incrementally inside scan() instead of only at the end, so successfully
processed files are remembered even if a later put_source or propose_page call
fails. Update the state immediately after a file’s source/proposal is accepted
in the loop, using _save_state alongside _load_state and the existing
state[str(resolved)] tracking, while keeping ScanResult behavior unchanged. This
keeps scan() idempotent across partial failures and preserves deduplication for
already-processed files.

Comment thread src/vouch/jsonl_server.py
Comment on lines +251 to +259
def _h_list_pages(p: dict) -> list[dict]:
pages = filter_pages(
_store().list_pages(),
kind=p.get("type"),
equals=p.get("meta"),
before=p.get("meta_before"),
after=p.get("meta_after"),
)
return [pg.model_dump(mode="json") for pg in pages]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

kb.list_pages payload diverges from the MCP surface's contract.

_h_list_pages returns pg.model_dump(mode="json") (the full Page model), whereas server.py's kb_list_pages returns a curated {id, title, type, tags, metadata} dict for the same method name. This means the JSONL transport exposes strictly more data (likely full page body/content) for a "list" operation than MCP does, which is both a payload-size and information-exposure inconsistency across transports for an identically-named method.

♻️ Proposed fix to align with server.py's contract
 def _h_list_pages(p: dict) -> list[dict]:
     pages = filter_pages(
         _store().list_pages(),
         kind=p.get("type"),
         equals=p.get("meta"),
         before=p.get("meta_before"),
         after=p.get("meta_after"),
     )
-    return [pg.model_dump(mode="json") for pg in pages]
+    return [
+        {"id": pg.id, "title": pg.title, "type": pg.type, "tags": pg.tags, "metadata": pg.metadata}
+        for pg in pages
+    ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _h_list_pages(p: dict) -> list[dict]:
pages = filter_pages(
_store().list_pages(),
kind=p.get("type"),
equals=p.get("meta"),
before=p.get("meta_before"),
after=p.get("meta_after"),
)
return [pg.model_dump(mode="json") for pg in pages]
def _h_list_pages(p: dict) -> list[dict]:
pages = filter_pages(
_store().list_pages(),
kind=p.get("type"),
equals=p.get("meta"),
before=p.get("meta_before"),
after=p.get("meta_after"),
)
return [
{"id": pg.id, "title": pg.title, "type": pg.type, "tags": pg.tags, "metadata": pg.metadata}
for pg in pages
]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/jsonl_server.py` around lines 251 - 259, The kb.list_pages JSONL
handler in _h_list_pages is returning the full Page model instead of matching
the MCP surface contract used by kb_list_pages. Update the return shape to a
curated dict with only the same fields as server.py’s kb_list_pages (for example
id, title, type, tags, metadata) and avoid exposing the full page body/content
through _store().list_pages() results.

Comment thread src/vouch/notify.py
Comment on lines +84 to +121
def load_webhooks(store: KBStore) -> list[Webhook]:
try:
loaded = yaml.safe_load(store.config_path.read_text(encoding="utf-8"))
except (OSError, yaml.YAMLError):
return []
if not isinstance(loaded, dict):
return []
raw = loaded.get("notify")
if not isinstance(raw, dict):
return []
hooks: list[Webhook] = []
for item in raw.get("webhooks") or []:
if not isinstance(item, dict) or not item.get("url"):
raise NotifyConfigError("notify.webhooks entries need a url")
events = item.get("events")
hooks.append(
Webhook(
url=_resolve_env(str(item["url"]), what="notify.webhooks.url"),
events=(
tuple(str(e) for e in events) if isinstance(events, list) else ALL_EVENTS
),
backlog_threshold=(
int(item["backlog_threshold"])
if item.get("backlog_threshold") is not None
else None
),
age_threshold=(
str(item["age_threshold"]) if item.get("age_threshold") else None
),
secret=(
_resolve_env(str(item["secret"]), what="notify.webhooks.secret")
if item.get("secret")
else None
),
include_summary=bool(item.get("include_summary", False)),
)
)
return hooks

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Numeric config fields aren't validated consistently.

int(item["backlog_threshold"]) (line 106) has no error handling, unlike the url check above it which raises a clean NotifyConfigError. A typo like backlog_threshold: "many" in YAML will surface as a raw, unhandled ValueError instead of a consistent config error.

🛠️ Proposed fix
+def _parse_int(raw: object, *, what: str) -> int | None:
+    if raw is None:
+        return None
+    try:
+        return int(raw)
+    except (TypeError, ValueError) as e:
+        raise NotifyConfigError(f"{what} must be an integer") from e
+
+
 def load_webhooks(store: KBStore) -> list[Webhook]:
     ...
-                backlog_threshold=(
-                    int(item["backlog_threshold"])
-                    if item.get("backlog_threshold") is not None
-                    else None
-                ),
+                backlog_threshold=_parse_int(
+                    item.get("backlog_threshold"), what="notify.webhooks.backlog_threshold"
+                ),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def load_webhooks(store: KBStore) -> list[Webhook]:
try:
loaded = yaml.safe_load(store.config_path.read_text(encoding="utf-8"))
except (OSError, yaml.YAMLError):
return []
if not isinstance(loaded, dict):
return []
raw = loaded.get("notify")
if not isinstance(raw, dict):
return []
hooks: list[Webhook] = []
for item in raw.get("webhooks") or []:
if not isinstance(item, dict) or not item.get("url"):
raise NotifyConfigError("notify.webhooks entries need a url")
events = item.get("events")
hooks.append(
Webhook(
url=_resolve_env(str(item["url"]), what="notify.webhooks.url"),
events=(
tuple(str(e) for e in events) if isinstance(events, list) else ALL_EVENTS
),
backlog_threshold=(
int(item["backlog_threshold"])
if item.get("backlog_threshold") is not None
else None
),
age_threshold=(
str(item["age_threshold"]) if item.get("age_threshold") else None
),
secret=(
_resolve_env(str(item["secret"]), what="notify.webhooks.secret")
if item.get("secret")
else None
),
include_summary=bool(item.get("include_summary", False)),
)
)
return hooks
def _parse_int(raw: object, *, what: str) -> int | None:
if raw is None:
return None
try:
return int(raw)
except (TypeError, ValueError) as e:
raise NotifyConfigError(f"{what} must be an integer") from e
def load_webhooks(store: KBStore) -> list[Webhook]:
try:
loaded = yaml.safe_load(store.config_path.read_text(encoding="utf-8"))
except (OSError, yaml.YAMLError):
return []
if not isinstance(loaded, dict):
return []
raw = loaded.get("notify")
if not isinstance(raw, dict):
return []
hooks: list[Webhook] = []
for item in raw.get("webhooks") or []:
if not isinstance(item, dict) or not item.get("url"):
raise NotifyConfigError("notify.webhooks entries need a url")
events = item.get("events")
hooks.append(
Webhook(
url=_resolve_env(str(item["url"]), what="notify.webhooks.url"),
events=(
tuple(str(e) for e in events) if isinstance(events, list) else ALL_EVENTS
),
backlog_threshold=_parse_int(
item.get("backlog_threshold"), what="notify.webhooks.backlog_threshold"
),
age_threshold=(
str(item["age_threshold"]) if item.get("age_threshold") else None
),
secret=(
_resolve_env(str(item["secret"]), what="notify.webhooks.secret")
if item.get("secret")
else None
),
include_summary=bool(item.get("include_summary", False)),
)
)
return hooks
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/notify.py` around lines 84 - 121, The numeric webhook config
parsing in load_webhooks is not handled consistently: converting
backlog_threshold directly with int(...) can raise a raw ValueError. Update the
webhook construction logic in load_webhooks so invalid backlog_threshold values
are caught and re-raised as NotifyConfigError with a clear message, matching the
existing validation style used for notify.webhooks entries and _resolve_env
fields.

Comment thread src/vouch/notify.py
Comment on lines +124 to +148
def deliver(
hook_url: str,
envelope: dict[str, object],
*,
secret: str | None = None,
timeout: float = DEFAULT_TIMEOUT,
) -> bool:
"""POST the envelope; log-and-swallow every failure. Returns success."""
body = json.dumps(envelope, sort_keys=True).encode("utf-8")
headers = {"Content-Type": "application/json", "User-Agent": "vouch-notify"}
if secret:
headers[SIGNATURE_HEADER] = hmac.new(
secret.encode("utf-8"), body, hashlib.sha256
).hexdigest()
req = urllib.request.Request(hook_url, data=body, headers=headers, method="POST")
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
ok = 200 <= resp.status < 300
except (urllib.error.URLError, OSError, ValueError) as e:
logger.warning("notify delivery to %s failed: %s", hook_url, e)
return False
if not ok:
logger.warning("notify delivery to %s got HTTP %s", hook_url, resp.status)
return ok

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

python3 - <<'PY'
import urllib.request, tempfile, pathlib, os, json

# probe file:// behavior
with tempfile.NamedTemporaryFile(delete=False) as f:
    f.write(b"hello")
    path = pathlib.Path(f.name).resolve()

url = path.as_uri()
print("URL:", url)

req = urllib.request.Request(url, method="GET")
try:
    with urllib.request.urlopen(req, timeout=5) as resp:
        print("TYPE:", type(resp).__name__)
        print("HAS_STATUS:", hasattr(resp, "status"))
        try:
            print("STATUS:", resp.status)
        except Exception as e:
            print("STATUS_ERR:", type(e).__name__, e)
        try:
            print("READ:", resp.read(20))
        except Exception as e:
            print("READ_ERR:", type(e).__name__, e)
except Exception as e:
    print("ERR:", type(e).__name__, e)
finally:
    os.unlink(path)
PY

Repository: vouchdev/vouch

Length of output: 243


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n## notify.py\n'
sed -n '1,220p' src/vouch/notify.py

printf '\n## fetch.py\n'
if [ -f src/vouch/fetch.py ]; then
  sed -n '1,260p' src/vouch/fetch.py
else
  echo 'src/vouch/fetch.py not found'
fi

Repository: vouchdev/vouch

Length of output: 12984


🏁 Script executed:

python3 - <<'PY'
import urllib.request, tempfile, pathlib, os

with tempfile.NamedTemporaryFile(delete=False) as f:
    f.write(b'hello')
    path = pathlib.Path(f.name).resolve()

url = path.as_uri()
print("URL:", url)

for method, data in [("GET", None), ("POST", b'{}')]:
    req = urllib.request.Request(url, data=data, method=method)
    try:
        with urllib.request.urlopen(req, timeout=5) as resp:
            print(method, "TYPE:", type(resp).__name__)
            print(method, "STATUS:", getattr(resp, "status", "<missing>"))
            print(method, "READ:", resp.read(20))
    except Exception as e:
        print(method, "ERR:", type(e).__name__, e)

os.unlink(path)
PY

Repository: vouchdev/vouch

Length of output: 298


Restrict webhook URLs to http/https
deliver() passes hook_url straight to urllib.request.urlopen(), and file:// URLs are accepted there even for POST, which can read local files. Mirror fetch.py’s scheme allowlist before dispatching.

🧰 Tools
🪛 ast-grep (0.44.0)

[info] 131-131: use jsonify instead of json.dumps for JSON output
Context: json.dumps(envelope, sort_keys=True)
Note: [CWE-116] Improper Encoding or Escaping of Output.

(use-jsonify)


[warning] 139-139: Request-controlled URL passed to urlopen; validate against an allowlist to prevent SSRF.
Context: urllib.request.urlopen(req, timeout=timeout)
Note: [CWE-918] Server-Side Request Forgery (SSRF).

(urlopen-unsanitized-data)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/notify.py` around lines 124 - 148, The deliver() webhook sender
currently forwards hook_url directly into urllib.request.urlopen(), which can
allow गैर-http schemes like file://; add the same scheme allowlist used in
fetch.py before building the Request or calling urlopen so only http and https
URLs are accepted. Update deliver() to validate hook_url early, log-and-return
False on disallowed schemes, and keep the rest of the POST flow unchanged.

Comment thread src/vouch/notify.py
Comment on lines +189 to +280
def sweep(store: KBStore, *, now: datetime | None = None) -> list[str]:
"""Evaluate triggers against the pending queue and fire subscribed hooks.

Idempotent per (event, proposal id): a re-run fires nothing new unless
the queue changed. Returns the event names fired (with repeats per hook).
"""
hooks = load_webhooks(store)
if not hooks:
return []
now = now or datetime.now(UTC)
pending = store.list_proposals(ProposalStatus.PENDING)
pending_ids = [p.id for p in pending]
state = _load_state(store.kb_dir)
fired: list[str] = []

for hook in hooks:
if EVENT_CREATED in hook.events:
new_ids = [pid for pid in pending_ids if pid not in state["created"]]
if new_ids:
summaries = (
[
str(p.payload.get("title") or p.payload.get("text") or "")[:120]
for p in pending
if p.id in new_ids
]
if hook.include_summary
else None
)
if deliver(
hook.url,
_envelope(
store, EVENT_CREATED,
proposal_ids=new_ids, pending_count=len(pending),
now=now, summaries=summaries,
),
secret=hook.secret,
):
fired.append(EVENT_CREATED)

if (
EVENT_BACKLOGGED in hook.events
and hook.backlog_threshold is not None
and len(pending) >= hook.backlog_threshold
):
marker = f"{hook.url}:{hook.backlog_threshold}"
if marker not in state["backlogged"] and deliver(
hook.url,
_envelope(
store, EVENT_BACKLOGGED,
proposal_ids=pending_ids, pending_count=len(pending), now=now,
),
secret=hook.secret,
):
fired.append(EVENT_BACKLOGGED)
state["backlogged"].append(marker)

if EVENT_AGED in hook.events and hook.age_threshold:
cutoff = parse_since(hook.age_threshold, now=now)
aged_ids = [
p.id
for p in pending
if cutoff is not None
and (
p.proposed_at.replace(tzinfo=UTC)
if p.proposed_at.tzinfo is None
else p.proposed_at
)
< cutoff
and p.id not in state["aged"]
]
if aged_ids and deliver(
hook.url,
_envelope(
store, EVENT_AGED,
proposal_ids=aged_ids, pending_count=len(pending), now=now,
),
secret=hook.secret,
):
fired.append(EVENT_AGED)
state["aged"].extend(aged_ids)

# created-state is hook-independent: a proposal counts as announced once
# any hook accepted it. drop decided proposals so the state can't grow
# without bound.
if any(e == EVENT_CREATED for e in fired):
state["created"] = sorted(set(state["created"]) | set(pending_ids))
state["created"] = [pid for pid in state["created"] if pid in pending_ids]
state["aged"] = [pid for pid in state["aged"] if pid in pending_ids]
if not pending:
state["backlogged"] = []
_save_state(store.kb_dir, state)
return fired

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Multi-webhook created state is hook-independent — a failing hook silently loses its retry window.

Per the comment at lines 270-272, a proposal is marked "created" globally once any hook accepts delivery of EVENT_CREATED. With two configured webhooks, if hook A succeeds and hook B's deliver() fails in the same sweep (dead endpoint, transient network error), pending_ids still gets folded into state["created"] at line 274 because fired contains at least one EVENT_CREATED entry. Hook B will never be retried for those proposal ids — even though its own delivery attempt failed — because the state doesn't track per-hook delivery outcomes.

This contradicts the module's "best-effort... dead endpoint can never wedge a sweep" framing for delivery, since here a dead endpoint permanently loses (not just delays) that notification once a sibling hook succeeds. No test exercises multiple webhooks with mixed success/failure to catch this.

Consider keying state["created"]/state["aged"] per-hook (e.g. dict[hook_url, list[str]]) so each webhook independently tracks and retries its own undelivered events.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/notify.py` around lines 189 - 280, The sweep() logic in notify.py
treats EVENT_CREATED as a single global state, so one webhook’s success can
clear retry eligibility for another webhook that failed. Update sweep() and its
state handling so created/aged delivery tracking is per-hook (for example keyed
by hook.url) instead of shared across all hooks, and only mark proposal ids
delivered for the specific hook after that hook’s deliver() succeeds. Preserve
the existing idempotent behavior, but make _load_state/_save_state and the
created/aged state updates reflect each webhook’s independent delivery outcome.

Comment thread src/vouch/onboarding.py
Comment on lines +429 to +466
def _merge_page_kinds(store: KBStore, kinds: dict[str, dict[str, object]]) -> list[str]:
"""Add missing `page_kinds` entries to config.yaml; return the names added.

When the file has no `page_kinds` key at all the block is appended
textually so operator comments and formatting survive. When the key
exists the file is parsed, missing kinds are filled in, and the mapping
is re-dumped (a yaml round-trip drops comments — the narrow price of a
real merge).
"""
path = store.kb_dir / "config.yaml"
text = path.read_text(encoding="utf-8") if path.exists() else ""
try:
loaded = yaml.safe_load(text) or {}
except yaml.YAMLError as e:
raise ValueError(f"config.yaml is not valid yaml; fix it first: {e}") from e
if not isinstance(loaded, dict):
raise ValueError("config.yaml must be a yaml mapping")

existing = loaded.get("page_kinds")
if not isinstance(existing, dict):
block = yaml.safe_dump(
{"page_kinds": kinds}, sort_keys=False, default_flow_style=False
)
prefix = text.rstrip("\n")
joined = f"{prefix}\n\n{block}" if prefix else block
path.write_text(joined, encoding="utf-8")
return sorted(kinds)

added = sorted(k for k in kinds if k not in existing)
if not added:
return []
for name in added:
existing[name] = kinds[name]
path.write_text(
yaml.safe_dump(loaded, sort_keys=False, default_flow_style=False),
encoding="utf-8",
)
return added

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Map the file and inspect the relevant function with line numbers.
ast-grep outline src/vouch/onboarding.py --view expanded || true
printf '\n--- relevant slice ---\n'
sed -n '380,500p' src/vouch/onboarding.py | cat -n

printf '\n--- search for page_kinds handling ---\n'
rg -n "page_kinds|_merge_page_kinds|config.yaml" src/vouch/onboarding.py src -S

Repository: vouchdev/vouch

Length of output: 13086


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n--- page_kinds reader ---\n'
sed -n '220,280p' src/vouch/page_kinds.py | cat -n

printf '\n--- any validation helpers mentioning page_kinds ---\n'
rg -n "page_kinds.*must|page_kinds.*mapping|load_page_kind_registry|validate_page|PageKindError" src/vouch/page_kinds.py src/vouch -S

Repository: vouchdev/vouch

Length of output: 4630


🏁 Script executed:

python3 - <<'PY'
import yaml

doc = "page_kinds: old\n\npage_kinds:\n  new: {}\n"
print(yaml.safe_load(doc))
PY

Repository: vouchdev/vouch

Length of output: 180


Treat malformed page_kinds as an error

A present-but-non-mapping page_kinds currently goes down the same path as a missing key and appends a second top-level page_kinds: block. On reload, the later block wins and the original value is dropped, so fail fast instead of silently shadowing bad config.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vouch/onboarding.py` around lines 429 - 466, The _merge_page_kinds helper
in onboarding.py is treating a present-but-invalid page_kinds value the same as
a missing key, which can silently append a second top-level block and shadow the
original config. Update the page_kinds handling so that only a missing key
triggers the append/merge path, and any existing non-dict value raises a
ValueError instead. Keep the fix inside _merge_page_kinds, using the existing
loaded/existing checks and preserving the current merge behavior for valid
mappings.

ruff flagged the two host-wrapper transcript entries at over 100
columns (e501); reformat only, no behaviour change.
Copilot AI review requested due to automatic review settings July 4, 2026 04:18

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR turns a vouch KB into a “company brain” workflow while preserving (and in one place tightening) the review gate: it introduces config-declared typed page kinds, adds read-only viewports for listing/querying pages and generating a reviewer digest, and adds operator-run intake/notification utilities that never reach approval paths.

Changes:

  • Add the company-brain onboarding template (vouch init --template company-brain) that merges typed page_kinds into config idempotently and seeds an approved guide page.
  • Add read-only viewports: frontmatter filters for kb.list_pages / vouch pages, plus kb.digest / vouch digest as a reviewer briefing.
  • Add intake + proactivity plumbing: vouch source fetch, vouch inbox, vouch notify, plus adapter slash commands/skills and capture title improvements.

Reviewed changes

Copilot reviewed 41 out of 43 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_page_kinds.py Tests for YAML date scalars in string schema + protected self-approval behavior
tests/test_page_filters.py Tests for page filters across CLI/MCP/JSONL surfaces
tests/test_openclaw_plugin_manifest.py Extends plugin manifest skill-name expectations
tests/test_onboarding.py Tests for template registration, idempotence, and config merge behavior
tests/test_notify.py Tests for webhook sweep semantics, signing, idempotence, retries
tests/test_install_adapter.py Verifies new Claude Code command files are installed/idempotent
tests/test_inbox.py Tests inbox scan/watch behavior and approve-path guardrails
tests/test_fetch.py Tests URL snapshot safety rules (scheme, redirects, cap, private-host guard)
tests/test_digest.py Tests digest composition, formatting, and read-only guarantee
tests/test_capture.py Tests transcript-based session title extraction and fallbacks
tests/test_brain_commands.py Guards for “never approve” instructions + skills existence + intake no-approve imports
src/vouch/server.py Adds kb.digest; extends kb.list_pages with filters + metadata
src/vouch/proposals.py Enforces protected page kinds against trusted-agent self-approval
src/vouch/page_kinds.py Adds protected-kind flag + accepts YAML date/datetime scalars for type: string
src/vouch/page_filters.py New shared filter implementation for page listings
src/vouch/onboarding.py Adds company-brain template, config merge logic, seeded guide artifacts
src/vouch/notify.py New outbound webhook notifier (sweep/test, env secrets, HMAC signing)
src/vouch/jsonl_server.py Registers kb.digest; adds list_pages filters in JSONL handler
src/vouch/inbox.py New inbox importer (file drop → source + pending page proposal)
src/vouch/fetch.py New URL snapshotter into content-addressed sources
src/vouch/digest.py New read-only digest builder + text/markdown renderers
src/vouch/cli.py Adds init templates, digest/pages/source fetch/inbox/notify commands; schema list shows protected
src/vouch/capabilities.py Registers kb.digest capability
package.json Bumps OpenClaw packaging version
openclaw.plugin.json Updates plugin manifest fields and skills root
examples/decision-log/README.md Updates example narrative to “decision evolution” framing
docs/img/examples/render.py Updates rendered example commands/args
docs/img/examples/decision-log-search.svg Updated rendered example output
docs/img/examples/decision-log-diff.svg Updated rendered diff example output
docs/company-brain.md New documentation for the workflow, tools, and constraints
CHANGELOG.md Documents new workflow features under Unreleased
adapters/openclaw/skills/vouch-standup/SKILL.md New OpenClaw skill spec
adapters/openclaw/skills/vouch-remember/SKILL.md New OpenClaw skill spec
adapters/openclaw/skills/vouch-record/SKILL.md New OpenClaw skill spec
adapters/openclaw/skills/vouch-followup/SKILL.md New OpenClaw skill spec
adapters/openclaw/skills/vouch-ask/SKILL.md New OpenClaw skill spec
adapters/claude-code/install.yaml Installs the new Claude Code commands
adapters/claude-code/.claude/commands/vouch-standup.md New Claude Code slash command
adapters/claude-code/.claude/commands/vouch-remember.md New Claude Code slash command
adapters/claude-code/.claude/commands/vouch-record.md New Claude Code slash command
adapters/claude-code/.claude/commands/vouch-followup.md New Claude Code slash command
adapters/claude-code/.claude/commands/vouch-ask.md New Claude Code slash command

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/vouch/onboarding.py
Comment on lines +447 to +455
existing = loaded.get("page_kinds")
if not isinstance(existing, dict):
block = yaml.safe_dump(
{"page_kinds": kinds}, sort_keys=False, default_flow_style=False
)
prefix = text.rstrip("\n")
joined = f"{prefix}\n\n{block}" if prefix else block
path.write_text(joined, encoding="utf-8")
return sorted(kinds)
Comment thread src/vouch/jsonl_server.py
Comment on lines +251 to +259
def _h_list_pages(p: dict) -> list[dict]:
pages = filter_pages(
_store().list_pages(),
kind=p.get("type"),
equals=p.get("meta"),
before=p.get("meta_before"),
after=p.get("meta_after"),
)
return [pg.model_dump(mode="json") for pg in pages]
Comment thread src/vouch/server.py
Comment on lines 333 to 336
return [
{"id": p.id, "title": p.title, "type": p.type, "tags": p.tags}
for p in _store().list_pages()
{"id": p.id, "title": p.title, "type": p.type, "tags": p.tags, "metadata": p.metadata}
for p in pages
]
Comment thread src/vouch/inbox.py
Comment on lines +11 to +14
A content-hash-keyed seen-state sidecar (`.vouch/inbox-state.json`) makes
re-runs cheap and idempotent: an unchanged file is skipped, an edited file
re-proposes. The watch loop is a bounded stdlib poll (the `vault_sync`
precedent) — no daemon, no watchdog dependency.
Comment thread src/vouch/notify.py
Comment on lines +204 to +207
for hook in hooks:
if EVENT_CREATED in hook.events:
new_ids = [pid for pid in pending_ids if pid not in state["created"]]
if new_ids:
@plind-junior plind-junior merged commit c5267d4 into test Jul 4, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adapters agent host adapters and install manifests cli command line interface docs documentation, specs, examples, and repo guidance mcp mcp, jsonl, and http surfaces openclaw openclaw integration size: XL 1000 or more changed non-doc lines storage kb storage, migrations, schemas, and proposals tests tests and fixtures

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants