feat: company brain — typed records, digest, intake, notify, readable captures#356
Conversation
declares record kinds (contact, org, project-record, meeting-notes, followup, decision-record, voice) as ordinary page_kinds config so both gates validate them, and seeds a cited, approved guide page describing the conventions. merging into config.yaml is additive: operator-declared kinds win, and when the file has no page_kinds key the block is appended textually so comments survive. idempotent via stable artifact ids.
…stry templates layer on top of the starter seed instead of replacing it, so an init'd kb always carries the walkthrough basics. click.Choice keeps the flag self-documenting and rejects unknown names before any i/o.
the natural-language layer for a team-memory kb lives host-side as prompts: ask answers only with citations, remember registers the user's words as a source then proposes claims citing it, record files entity + typed-page pairs, followup files dated commitments. every flow terminates at kb_propose_* — none may call kb_approve. registered in the claude-code install manifest (T3) and the plugin skills array.
typed record kinds put their structure in page frontmatter; this adds one shared filter vocabulary over it — kind equality, field equality, and inclusive ordered bounds (numbers, iso dates) — as a viewport over store.list_pages(). deliberately not a query language: anything richer belongs in the caller. same kb.list_pages method, new optional params, plus a human mirror: vouch pages --kind followup --before due_at=...
…e followups an operator returning to a kb reconstructs "what needs me" from several commands; vouch digest folds them into one glance — pending proposals oldest-first, decisions in the window (from the decided/ records), stale claims behind the count metrics already reports, and open followup pages whose due_at has arrived. strictly a viewport: composes list_proposals, list_claims, list_pages and metrics.compute, writes nothing, logs no audit event. registered at all four surfaces as kb.digest plus a /vouch-standup slash command that narrates it.
yaml parses a bare due_at: 2026-07-01 as a date object — from the cli --meta parser and from every frontmatter disk round-trip — so a type: string schema field either rejects the natural spelling at propose time or, worse, fails re-validation at approve time for a page that validated when proposed. treat date/datetime as string-compatible scalars; genuinely wrong types still fail.
one page covering the template, the record conventions, frontmatter queries, the digest cron loop, and the slash commands — plus the honest constraint that the review queue is the bottleneck by design.
evidence intake for web content: fetch the exact bytes once, register them content-addressed, and let claims cite the immutable snapshot id so the evidence a reviewer approved against survives the live page drifting. conservative defaults for the first outbound network call in the intake path: http/https only, every redirect hop re-validated, hosts must resolve to public addresses, 2mib body cap, raw bytes only. fetched_at and final_url recorded in source metadata; audit event source.fetch. never imports proposals — sources are evidence, not knowledge.
each new .md/.txt file in the folder is registered as a content-addressed source and rolled into exactly one pending page proposal citing it — the capture.finalize shape with a dropped file as the trigger instead of a hook payload. content-hash seen-state sidecar makes re-runs idempotent; edited files re-propose. --watch is a bounded stdlib poll, no daemon. reads go through read_under_root, and an ast test pins that the module never imports an approve path.
the gate only works if a human notices it has work. vouch notify sweep evaluates the pending queue against config-declared webhooks — new proposals, a backlog crossing its threshold, proposals aged past a cutoff — and posts a small json envelope (optionally hmac-signed, secrets via the env: convention serve.bearer_token uses). delivery is best-effort and idempotent per (event, proposal); state lives in the derived state.db so losing it can only re-notify. read-and-notify only: no path here proposes, approves, or edits. protected page kinds tighten the gate the one direction invariants welcome: a kind with protected: true is exempt from the trusted-agent self-approval opt-out, so policy-bearing pages (voice, decision records — both marked in the company-brain template) always need a reviewer other than the proposer.
the nl layer lives host-side as prompt files, so the strongest deterministic guarantee is textual: every brain command must keep its explicit never-approve instruction, every manifest skills path must exist, and an ast walk pins that fetch/inbox/notify have no import or attribute path to approve(). plus docs and changelog for the intake and notification features.
the queue used to show 27 proposals all titled "session summary: <project> (<uuid>)" — unreviewable at a glance. the sessionend hook payload already carries transcript_path, so finalize now lifts the human's first genuine prompt from the transcript (pure extraction — host wrapper messages skipped, no model involved, capture's no-llm rule holds) and titles the proposal "session: <prompt excerpt>", with the prompt quoted in a new body section. without a transcript the title falls back to what changed: "session <date>: web, docs — 9 file(s)". the uuid stays in the body for traceability.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds a "company brain" knowledge workflow: an onboarding template with typed/protected page kinds, self-approval gating, frontmatter-based page filtering, a read-only digest command, content-addressed URL/source fetch, inbox file ingestion, HMAC-signed webhook notifications, transcript-derived capture titles, new Claude slash commands, and supporting docs and tests. ChangesCompany Brain Feature Set
Estimated code review effort: 4 (Complex) | ~75 minutes Sequence Diagram(s)sequenceDiagram
participant CLI as vouch inbox
participant Inbox as inbox.scan
participant Store as KBStore
CLI->>Inbox: scan(store, directory, config)
Inbox->>Store: read_under_root, put_source
Inbox->>Store: propose_page (if new/changed)
Store-->>Inbox: proposal id
Inbox-->>CLI: ScanResult(proposed, skipped)
sequenceDiagram
participant CLI as vouch notify sweep
participant Notify as notify.sweep
participant Store as KBStore
participant Webhook as Webhook endpoint
CLI->>Notify: sweep(store, now)
Notify->>Store: load_webhooks, list pending proposals
Notify->>Webhook: deliver(envelope, secret)
Webhook-->>Notify: HTTP status
Notify->>Store: save idempotence state
Notify-->>CLI: fired event names
Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
resolve openclaw.plugin.json to the 2026.6 loader dialect: id/kind/ version plus a single json-schema configSchema, skills as directories. publish the five company-brain commands as adapters/openclaw/skills/<name>/SKILL.md and extend the manifest test's skill list to all nine. bump package.json to 1.1.0 so the version lockstep test holds against pyproject. repoint the decision-log screenshot shots at claims that exist in the reworked example fixture and re-render the svgs.
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (3)
tests/test_brain_commands.py (1)
44-55: 🩺 Stability & Availability | 🔵 Trivial | 💤 Low valueAST guard covers attribute-style calls but not
getattr-based indirection.The check catches
from X import approveand.approveattribute access, but a call likegetattr(life, "approve")(...)would bypass both checks. Given this is a defense-in-depth test rather than the sole safeguard (the real gate is_approval_block_reasoninproposals.py), this is a low-priority gap.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_brain_commands.py` around lines 44 - 55, The AST guard in test_intake_modules_have_no_approve_import only checks ImportFrom names and direct .approve attributes, so it can miss getattr-based access. Update the test to also scan for getattr(...) calls that use the string "approve" (and any similar indirection) in the same module walk, using the existing test_intake_modules_have_no_approve_import logic as the place to extend the safeguard.src/vouch/cli.py (1)
1435-1447: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAccessing a private helper across module boundary.
notify_mod._resolve_envis underscore-prefixed (private tonotify.py) yet called directly fromcli.py. Consider exposing a publicresolve_env/resolve_secrethelper for cross-module use, matching how othernotify_mod/fetch_modfunctions are consumed as public API elsewhere in this file.♻️ Suggested fix
-def _resolve_env(raw: str, *, what: str) -> str: +def resolve_env(raw: str, *, what: str) -> str:And update the call site to
notify_mod.resolve_env(...).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/cli.py` around lines 1435 - 1447, The notify_test command in cli.py is calling the private underscore-prefixed helper notify_mod._resolve_env across a module boundary. Expose a public helper in notify.py (for example resolve_env or resolve_secret) and update notify_test to call that public API instead, following the same pattern used by other notify_mod/fetch_mod functions in this file. Ensure the new public helper preserves the current behavior for resolving --secret before send_test is invoked.src/vouch/digest.py (1)
158-216: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winStale-claim logic is duplicated between
digest.build()andmetrics.compute().
stale_claims(rows) is derived from a bespoke loop here, whilestale_totalis pulled from a separatemetrics.compute()call. The comment at Line 158-159 states they intend the same anchor/threshold semantics, but the criteria (retired-status set, anchor precedence,>vs>=boundary) live in two places. If they ever drift, the digest will show astale_totalcount that doesn't match the number ofstale_claimsrows returned (independent oflimittruncation), undermining the "read-only reviewer briefing" contract this module exists for.Consider extracting the staleness predicate/anchor computation into a single shared helper in
metrics.pythat bothcompute()anddigest.build()call.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vouch/digest.py` around lines 158 - 216, The stale-claim selection logic is duplicated between digest.build() and metrics.compute(), so the row list and stale_total can drift. Extract the shared staleness predicate and anchor calculation into a helper in metrics.py, and have both build() and compute() use that helper for retired-status filtering, anchor precedence, and the freshness boundary. Keep the existing limit handling only in digest.build(), but make the stale_claims rows and stale_total derive from the same shared logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/company-brain.md`:
- Around line 21-29: The `followup` entry in the company-brain table is missing
a required-field marker for `due_at`, while `followup_status` is already marked
required. Update the `followup` row in the table to show both `due_at` and
`followup_status` as required, keeping the wording aligned with the existing
`kind`/`frontmatter`/`rule` structure so readers see the full requirement set.
In `@src/vouch/cli.py`:
- Around line 1436-1437: The `notify test` CLI option `--secret` currently
allows a literal secret to be passed directly, which should be avoided. Update
the `notify_test` command in `src/vouch/cli.py` so `--secret` follows the same
`env:VAR` convention used by the webhook config path, or otherwise make the help
text for `--secret` explicitly warn that raw values are exposed in shell history
and process listings. Make sure the handling around the `--url`/`--secret`
options and the `notify_test` flow consistently enforces or documents this
behavior.
In `@src/vouch/fetch.py`:
- Around line 116-126: The non-redirect HTTPError path in the fetch loop leaks
the underlying response because only the redirect branch closes e; update the
exception handling in the fetch logic around _opener.open in src/vouch/fetch.py
so every caught urllib.error.HTTPError is closed before raising FetchError. Keep
the existing redirect handling in place, but ensure the 4xx/5xx branch also
calls e.close() (or otherwise closes the response) before re-raising, using the
current fetch loop and current/target flow as the location to fix.
- Around line 64-138: The fetch path in fetch_url/_check_url still relies on
urllib to resolve the host again at connection time, which leaves a DNS
rebinding gap after the initial public-IP validation. Update the transport used
by fetch_url so it connects to the vetted IP address returned by _check_url,
while still sending the original hostname for Host/SNI, and keep redirect
re-validation intact. Use the existing symbols _check_url, fetch_url, and
_opener as the main places to wire in the pinned-transport behavior.
In `@src/vouch/inbox.py`:
- Around line 94-142: Persist inbox scan state incrementally inside scan()
instead of only at the end, so successfully processed files are remembered even
if a later put_source or propose_page call fails. Update the state immediately
after a file’s source/proposal is accepted in the loop, using _save_state
alongside _load_state and the existing state[str(resolved)] tracking, while
keeping ScanResult behavior unchanged. This keeps scan() idempotent across
partial failures and preserves deduplication for already-processed files.
- Around line 145-162: The watch() poll loop in inbox.py should not die on a
single scan() failure; add per-iteration exception handling around the
scan(store, directory) call inside watch(). Catch transient/file-specific
exceptions, log or report them with enough context, then continue to the next
tick so the background poller keeps running; use the existing watch() and scan()
symbols to place the fix.
In `@src/vouch/jsonl_server.py`:
- Around line 251-259: The kb.list_pages JSONL handler in _h_list_pages is
returning the full Page model instead of matching the MCP surface contract used
by kb_list_pages. Update the return shape to a curated dict with only the same
fields as server.py’s kb_list_pages (for example id, title, type, tags,
metadata) and avoid exposing the full page body/content through
_store().list_pages() results.
- Around line 101-110: The _h_digest helper is letting explicit null inputs
bypass its defaults, so handle None values before parsing in this function. In
_h_digest, normalize p["since"], p["stale_days"], and p["limit"] so that null is
treated the same as a missing key before passing values to
metrics_mod.parse_since, int(...), and digest_mod.build. Keep the existing
default constants from digest_mod and metrics_mod, but ensure the fallback is
applied whenever the provided value is None.
In `@src/vouch/notify.py`:
- Around line 189-280: The sweep() logic in notify.py treats EVENT_CREATED as a
single global state, so one webhook’s success can clear retry eligibility for
another webhook that failed. Update sweep() and its state handling so
created/aged delivery tracking is per-hook (for example keyed by hook.url)
instead of shared across all hooks, and only mark proposal ids delivered for the
specific hook after that hook’s deliver() succeeds. Preserve the existing
idempotent behavior, but make _load_state/_save_state and the created/aged state
updates reflect each webhook’s independent delivery outcome.
- Around line 84-121: The numeric webhook config parsing in load_webhooks is not
handled consistently: converting backlog_threshold directly with int(...) can
raise a raw ValueError. Update the webhook construction logic in load_webhooks
so invalid backlog_threshold values are caught and re-raised as
NotifyConfigError with a clear message, matching the existing validation style
used for notify.webhooks entries and _resolve_env fields.
- Around line 124-148: The deliver() webhook sender currently forwards hook_url
directly into urllib.request.urlopen(), which can allow गैर-http schemes like
file://; add the same scheme allowlist used in fetch.py before building the
Request or calling urlopen so only http and https URLs are accepted. Update
deliver() to validate hook_url early, log-and-return False on disallowed
schemes, and keep the rest of the POST flow unchanged.
In `@src/vouch/onboarding.py`:
- Around line 429-466: The _merge_page_kinds helper in onboarding.py is treating
a present-but-invalid page_kinds value the same as a missing key, which can
silently append a second top-level block and shadow the original config. Update
the page_kinds handling so that only a missing key triggers the append/merge
path, and any existing non-dict value raises a ValueError instead. Keep the fix
inside _merge_page_kinds, using the existing loaded/existing checks and
preserving the current merge behavior for valid mappings.
---
Nitpick comments:
In `@src/vouch/cli.py`:
- Around line 1435-1447: The notify_test command in cli.py is calling the
private underscore-prefixed helper notify_mod._resolve_env across a module
boundary. Expose a public helper in notify.py (for example resolve_env or
resolve_secret) and update notify_test to call that public API instead,
following the same pattern used by other notify_mod/fetch_mod functions in this
file. Ensure the new public helper preserves the current behavior for resolving
--secret before send_test is invoked.
In `@src/vouch/digest.py`:
- Around line 158-216: The stale-claim selection logic is duplicated between
digest.build() and metrics.compute(), so the row list and stale_total can drift.
Extract the shared staleness predicate and anchor calculation into a helper in
metrics.py, and have both build() and compute() use that helper for
retired-status filtering, anchor precedence, and the freshness boundary. Keep
the existing limit handling only in digest.build(), but make the stale_claims
rows and stale_total derive from the same shared logic.
In `@tests/test_brain_commands.py`:
- Around line 44-55: The AST guard in test_intake_modules_have_no_approve_import
only checks ImportFrom names and direct .approve attributes, so it can miss
getattr-based access. Update the test to also scan for getattr(...) calls that
use the string "approve" (and any similar indirection) in the same module walk,
using the existing test_intake_modules_have_no_approve_import logic as the place
to extend the safeguard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 815b744b-9807-4085-b6fc-ad2292748f0a
📒 Files selected for processing (32)
CHANGELOG.mdadapters/claude-code/.claude/commands/vouch-ask.mdadapters/claude-code/.claude/commands/vouch-followup.mdadapters/claude-code/.claude/commands/vouch-record.mdadapters/claude-code/.claude/commands/vouch-remember.mdadapters/claude-code/.claude/commands/vouch-standup.mdadapters/claude-code/install.yamldocs/company-brain.mdopenclaw.plugin.jsonsrc/vouch/capabilities.pysrc/vouch/capture.pysrc/vouch/cli.pysrc/vouch/digest.pysrc/vouch/fetch.pysrc/vouch/inbox.pysrc/vouch/jsonl_server.pysrc/vouch/notify.pysrc/vouch/onboarding.pysrc/vouch/page_filters.pysrc/vouch/page_kinds.pysrc/vouch/proposals.pysrc/vouch/server.pytests/test_brain_commands.pytests/test_capture.pytests/test_digest.pytests/test_fetch.pytests/test_inbox.pytests/test_install_adapter.pytests/test_notify.pytests/test_onboarding.pytests/test_page_filters.pytests/test_page_kinds.py
| | kind | frontmatter | rule | | ||
| |---|---|---| | ||
| | `contact` | `role` (required), `org`, `email` | a person you work with | | ||
| | `org` | `website` | a company or organisation | | ||
| | `project-record` | `record_status` (required), `owner` | one project's living record | | ||
| | `meeting-notes` | `date`, `attendees` | notes from one meeting | | ||
| | `followup` | `due_at`, `followup_status` (required), `owner` | a dated commitment | | ||
| | `decision-record` | — | must cite claims/sources | | ||
| | `voice` | — | must cite the examples it distills | |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
followup row understates required fields.
The table marks only followup_status as (required) for followup, but the onboarding test confirms both fields are required: assert "due_at" in required" and "assert "followup_status" in required (per tests/test_onboarding.py). Update the table to mark due_at as required too, or readers will assume it's optional.
📝 Proposed fix
-| `followup` | `due_at`, `followup_status` (required), `owner` | a dated commitment |
+| `followup` | `due_at` (required), `followup_status` (required), `owner` | a dated commitment |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | kind | frontmatter | rule | | |
| |---|---|---| | |
| | `contact` | `role` (required), `org`, `email` | a person you work with | | |
| | `org` | `website` | a company or organisation | | |
| | `project-record` | `record_status` (required), `owner` | one project's living record | | |
| | `meeting-notes` | `date`, `attendees` | notes from one meeting | | |
| | `followup` | `due_at`, `followup_status` (required), `owner` | a dated commitment | | |
| | `decision-record` | — | must cite claims/sources | | |
| | `voice` | — | must cite the examples it distills | | |
| | kind | frontmatter | rule | | |
| |---|---|---| | |
| | `contact` | `role` (required), `org`, `email` | a person you work with | | |
| | `org` | `website` | a company or organisation | | |
| | `project-record` | `record_status` (required), `owner` | one project's living record | | |
| | `meeting-notes` | `date`, `attendees` | notes from one meeting | | |
| | `followup` | `due_at` (required), `followup_status` (required), `owner` | a dated commitment | | |
| | `decision-record` | — | must cite claims/sources | | |
| | `voice` | — | must cite the examples it distills | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/company-brain.md` around lines 21 - 29, The `followup` entry in the
company-brain table is missing a required-field marker for `due_at`, while
`followup_status` is already marked required. Update the `followup` row in the
table to show both `due_at` and `followup_status` as required, keeping the
wording aligned with the existing `kind`/`frontmatter`/`rule` structure so
readers see the full requirement set.
| @click.option("--url", required=True) | ||
| @click.option("--secret", default=None, help="Optional hmac secret (or env:VAR).") |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟡 Minor | ⚡ Quick win
--secret accepts raw secrets on the CLI.
Unlike webhook config secrets (which support env:VAR), notify test --secret lets a caller pass the literal secret value as a CLI argument, which lands in shell history and is visible via ps. Consider requiring env:VAR for this flag (mirroring the config-side convention) or documenting the risk in --help.
Also applies to: 1441-1444
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/cli.py` around lines 1436 - 1437, The `notify test` CLI option
`--secret` currently allows a literal secret to be passed directly, which should
be avoided. Update the `notify_test` command in `src/vouch/cli.py` so `--secret`
follows the same `env:VAR` convention used by the webhook config path, or
otherwise make the help text for `--secret` explicitly warn that raw values are
exposed in shell history and process listings. Make sure the handling around the
`--url`/`--secret` options and the `notify_test` flow consistently enforces or
documents this behavior.
| def _addr_is_public(address: str) -> bool: | ||
| try: | ||
| ip = ipaddress.ip_address(address) | ||
| except ValueError: | ||
| return False | ||
| return not ( | ||
| ip.is_private | ||
| or ip.is_loopback | ||
| or ip.is_link_local | ||
| or ip.is_multicast | ||
| or ip.is_reserved | ||
| or ip.is_unspecified | ||
| ) | ||
|
|
||
|
|
||
| def _check_url(url: str, *, allow_private: bool) -> None: | ||
| parsed = urllib.parse.urlparse(url) | ||
| if parsed.scheme not in ("http", "https"): | ||
| raise FetchError(f"only http/https URLs can be snapshotted, got {url!r}") | ||
| host = parsed.hostname | ||
| if not host: | ||
| raise FetchError(f"URL has no host: {url!r}") | ||
| if allow_private: | ||
| return | ||
| try: | ||
| infos = socket.getaddrinfo(host, parsed.port or 0, proto=socket.IPPROTO_TCP) | ||
| except OSError as e: | ||
| raise FetchError(f"cannot resolve {host!r}: {e}") from e | ||
| for info in infos: | ||
| address = str(info[4][0]) | ||
| if not _addr_is_public(address): | ||
| raise FetchError( | ||
| f"{host!r} resolves to non-public address {address} — refusing to fetch" | ||
| ) | ||
|
|
||
|
|
||
| def fetch_url( | ||
| url: str, | ||
| *, | ||
| max_bytes: int = DEFAULT_MAX_BYTES, | ||
| timeout: float = DEFAULT_TIMEOUT, | ||
| allow_private: bool = False, | ||
| ) -> FetchResult: | ||
| """Fetch `url` with redirect re-validation and a byte cap. | ||
|
|
||
| allow_private skips the public-address check — used by tests that run a | ||
| loopback fixture server; production callers leave it False. | ||
| """ | ||
| current = url | ||
| for _ in range(MAX_REDIRECTS + 1): | ||
| _check_url(current, allow_private=allow_private) | ||
| req = urllib.request.Request(current, headers={"User-Agent": _USER_AGENT}) | ||
| try: | ||
| resp = _opener.open(req, timeout=timeout) | ||
| except urllib.error.HTTPError as e: | ||
| if e.code in (301, 302, 303, 307, 308): | ||
| target = e.headers.get("Location") | ||
| e.close() | ||
| if not target: | ||
| raise FetchError(f"redirect from {current!r} without Location") from e | ||
| current = urllib.parse.urljoin(current, target) | ||
| continue | ||
| raise FetchError(f"HTTP {e.code} fetching {current!r}") from e | ||
| except OSError as e: | ||
| raise FetchError(f"fetch failed for {current!r}: {e}") from e | ||
| with resp: | ||
| body = resp.read(max_bytes + 1) | ||
| if len(body) > max_bytes: | ||
| raise FetchError( | ||
| f"{current!r} exceeds the {max_bytes}-byte snapshot cap" | ||
| ) | ||
| content_type = resp.headers.get("Content-Type", "application/octet-stream") | ||
| media_type = content_type.split(";")[0].strip() or "application/octet-stream" | ||
| return FetchResult(content=body, final_url=current, media_type=media_type) | ||
| raise FetchError(f"too many redirects fetching {url!r}") |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n "fetch|source|trust" openclaw.plugin.json
rg -n "source_fetch|fetch_mod\." src/vouch/cli.pyRepository: vouchdev/vouch
Length of output: 389
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== openclaw.plugin.json =="
ast-grep outline openclaw.plugin.json --view expanded || true
nl -ba openclaw.plugin.json | sed -n '50,90p'
echo
echo "== src/vouch/fetch.py =="
ast-grep outline src/vouch/fetch.py --view expanded || true
nl -ba src/vouch/fetch.py | sed -n '1,220p'
echo
echo "== src/vouch/cli.py =="
ast-grep outline src/vouch/cli.py --view expanded || true
nl -ba src/vouch/cli.py | sed -n '1310,1360p'Repository: vouchdev/vouch
Length of output: 254
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== openclaw.plugin.json =="
sed -n '1,220p' openclaw.plugin.json
echo
echo "== src/vouch/fetch.py (relevant range) =="
cat -n src/vouch/fetch.py | sed -n '1,220p'
echo
echo "== src/vouch/cli.py (relevant range) =="
cat -n src/vouch/cli.py | sed -n '1310,1365p'Repository: vouchdev/vouch
Length of output: 11442
Pin the fetch transport to the vetted IP to close the DNS rebinding window. _check_url validates one resolution, but urllib can resolve again when opening the socket, so a hostile DNS change can still send the request to a private/internal address. Use a transport that connects to the already-approved IP while preserving Host/SNI, rather than relying on a second lookup.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/fetch.py` around lines 64 - 138, The fetch path in
fetch_url/_check_url still relies on urllib to resolve the host again at
connection time, which leaves a DNS rebinding gap after the initial public-IP
validation. Update the transport used by fetch_url so it connects to the vetted
IP address returned by _check_url, while still sending the original hostname for
Host/SNI, and keep redirect re-validation intact. Use the existing symbols
_check_url, fetch_url, and _opener as the main places to wire in the
pinned-transport behavior.
Source: Coding guidelines
| try: | ||
| resp = _opener.open(req, timeout=timeout) | ||
| except urllib.error.HTTPError as e: | ||
| if e.code in (301, 302, 303, 307, 308): | ||
| target = e.headers.get("Location") | ||
| e.close() | ||
| if not target: | ||
| raise FetchError(f"redirect from {current!r} without Location") from e | ||
| current = urllib.parse.urljoin(current, target) | ||
| continue | ||
| raise FetchError(f"HTTP {e.code} fetching {current!r}") from e |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Unclosed HTTPError response on non-redirect failures leaks the socket.
For redirect codes, e.close() is called before continuing; for every other HTTP error (4xx/5xx), e (which wraps the open response) is never closed before raise FetchError(...). Under repeated fetch attempts (e.g. via vouch source fetch retries, inbox/notify flows hitting broken links), this leaks file descriptors/sockets.
🔧 Proposed fix
raise FetchError(f"HTTP {e.code} fetching {current!r}") from ebecomes:
+ e.close()
raise FetchError(f"HTTP {e.code} fetching {current!r}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| resp = _opener.open(req, timeout=timeout) | |
| except urllib.error.HTTPError as e: | |
| if e.code in (301, 302, 303, 307, 308): | |
| target = e.headers.get("Location") | |
| e.close() | |
| if not target: | |
| raise FetchError(f"redirect from {current!r} without Location") from e | |
| current = urllib.parse.urljoin(current, target) | |
| continue | |
| raise FetchError(f"HTTP {e.code} fetching {current!r}") from e | |
| try: | |
| resp = _opener.open(req, timeout=timeout) | |
| except urllib.error.HTTPError as e: | |
| if e.code in (301, 302, 303, 307, 308): | |
| target = e.headers.get("Location") | |
| e.close() | |
| if not target: | |
| raise FetchError(f"redirect from {current!r} without Location") from e | |
| current = urllib.parse.urljoin(current, target) | |
| continue | |
| e.close() | |
| raise FetchError(f"HTTP {e.code} fetching {current!r}") from e |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/fetch.py` around lines 116 - 126, The non-redirect HTTPError path
in the fetch loop leaks the underlying response because only the redirect branch
closes e; update the exception handling in the fetch logic around _opener.open
in src/vouch/fetch.py so every caught urllib.error.HTTPError is closed before
raising FetchError. Keep the existing redirect handling in place, but ensure the
4xx/5xx branch also calls e.close() (or otherwise closes the response) before
re-raising, using the current fetch loop and current/target flow as the location
to fix.
| def scan(store: KBStore, directory: Path, *, config: InboxConfig | None = None) -> ScanResult: | ||
| """One pass: register + propose every new or changed eligible file.""" | ||
| cfg = config or load_config(store) | ||
| if not cfg.enabled: | ||
| return ScanResult() | ||
|
|
||
| state = _load_state(store) | ||
| proposed: list[str] = [] | ||
| skipped: list[str] = [] | ||
|
|
||
| for path in sorted(p for p in directory.iterdir() if p.is_file()): | ||
| if path.suffix.lower() not in cfg.extensions: | ||
| skipped.append(path.name) | ||
| continue | ||
| # read through the same containment + O_NOFOLLOW hardening the MCP | ||
| # register_source_from_path entrypoint uses | ||
| resolved, data = store.read_under_root(path) | ||
| text = data.decode("utf-8", errors="replace") | ||
| if len(text.strip()) < cfg.min_chars: | ||
| skipped.append(path.name) | ||
| continue | ||
|
|
||
| source = store.put_source( | ||
| data, | ||
| title=path.name, | ||
| locator=str(resolved), | ||
| source_type="file", | ||
| media_type="text/markdown" if path.suffix.lower() == ".md" else "text/plain", | ||
| tags=["inbox"], | ||
| ) | ||
| if state.get(str(resolved)) == source.id: | ||
| skipped.append(path.name) | ||
| continue | ||
|
|
||
| proposal = propose_page( | ||
| store, | ||
| title=f"inbox: {path.stem}", | ||
| body=text, | ||
| page_type="log", | ||
| source_ids=[source.id], | ||
| proposed_by=INBOX_ACTOR, | ||
| rationale=f"imported from inbox file {path.name}; distill durable facts into claims", | ||
| tags=["inbox"], | ||
| ) | ||
| state[str(resolved)] = source.id | ||
| proposed.append(proposal.id) | ||
|
|
||
| _save_state(store, state) | ||
| return ScanResult(proposed=proposed, skipped=skipped) |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
State persisted only once at end of loop — partial failures cause duplicate re-proposals.
state is only written via _save_state after the entire for loop completes (line 141). If propose_page or store.put_source raises partway through (e.g. an unexpected validation error on one file), all files successfully processed earlier in the same scan() call never get their dedup key persisted. The next scan will treat them as new again, re-registering sources and re-proposing pages for content already proposed — breaking the idempotency contract this module's docstring promises and test_scan_seen_state_dedups_and_rescans_changed_files verifies for the happy path.
🛠️ Proposed fix: persist state incrementally
state = _load_state(store)
proposed: list[str] = []
skipped: list[str] = []
- for path in sorted(p for p in directory.iterdir() if p.is_file()):
- ...
- state[str(resolved)] = source.id
- proposed.append(proposal.id)
-
- _save_state(store, state)
- return ScanResult(proposed=proposed, skipped=skipped)
+ try:
+ for path in sorted(p for p in directory.iterdir() if p.is_file()):
+ ...
+ state[str(resolved)] = source.id
+ proposed.append(proposal.id)
+ finally:
+ _save_state(store, state)
+ return ScanResult(proposed=proposed, skipped=skipped)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def scan(store: KBStore, directory: Path, *, config: InboxConfig | None = None) -> ScanResult: | |
| """One pass: register + propose every new or changed eligible file.""" | |
| cfg = config or load_config(store) | |
| if not cfg.enabled: | |
| return ScanResult() | |
| state = _load_state(store) | |
| proposed: list[str] = [] | |
| skipped: list[str] = [] | |
| for path in sorted(p for p in directory.iterdir() if p.is_file()): | |
| if path.suffix.lower() not in cfg.extensions: | |
| skipped.append(path.name) | |
| continue | |
| # read through the same containment + O_NOFOLLOW hardening the MCP | |
| # register_source_from_path entrypoint uses | |
| resolved, data = store.read_under_root(path) | |
| text = data.decode("utf-8", errors="replace") | |
| if len(text.strip()) < cfg.min_chars: | |
| skipped.append(path.name) | |
| continue | |
| source = store.put_source( | |
| data, | |
| title=path.name, | |
| locator=str(resolved), | |
| source_type="file", | |
| media_type="text/markdown" if path.suffix.lower() == ".md" else "text/plain", | |
| tags=["inbox"], | |
| ) | |
| if state.get(str(resolved)) == source.id: | |
| skipped.append(path.name) | |
| continue | |
| proposal = propose_page( | |
| store, | |
| title=f"inbox: {path.stem}", | |
| body=text, | |
| page_type="log", | |
| source_ids=[source.id], | |
| proposed_by=INBOX_ACTOR, | |
| rationale=f"imported from inbox file {path.name}; distill durable facts into claims", | |
| tags=["inbox"], | |
| ) | |
| state[str(resolved)] = source.id | |
| proposed.append(proposal.id) | |
| _save_state(store, state) | |
| return ScanResult(proposed=proposed, skipped=skipped) | |
| def scan(store: KBStore, directory: Path, *, config: InboxConfig | None = None) -> ScanResult: | |
| """One pass: register + propose every new or changed eligible file.""" | |
| cfg = config or load_config(store) | |
| if not cfg.enabled: | |
| return ScanResult() | |
| state = _load_state(store) | |
| proposed: list[str] = [] | |
| skipped: list[str] = [] | |
| try: | |
| for path in sorted(p for p in directory.iterdir() if p.is_file()): | |
| if path.suffix.lower() not in cfg.extensions: | |
| skipped.append(path.name) | |
| continue | |
| # read through the same containment + O_NOFOLLOW hardening the MCP | |
| # register_source_from_path entrypoint uses | |
| resolved, data = store.read_under_root(path) | |
| text = data.decode("utf-8", errors="replace") | |
| if len(text.strip()) < cfg.min_chars: | |
| skipped.append(path.name) | |
| continue | |
| source = store.put_source( | |
| data, | |
| title=path.name, | |
| locator=str(resolved), | |
| source_type="file", | |
| media_type="text/markdown" if path.suffix.lower() == ".md" else "text/plain", | |
| tags=["inbox"], | |
| ) | |
| if state.get(str(resolved)) == source.id: | |
| skipped.append(path.name) | |
| continue | |
| proposal = propose_page( | |
| store, | |
| title=f"inbox: {path.stem}", | |
| body=text, | |
| page_type="log", | |
| source_ids=[source.id], | |
| proposed_by=INBOX_ACTOR, | |
| rationale=f"imported from inbox file {path.name}; distill durable facts into claims", | |
| tags=["inbox"], | |
| ) | |
| state[str(resolved)] = source.id | |
| proposed.append(proposal.id) | |
| finally: | |
| _save_state(store, state) | |
| return ScanResult(proposed=proposed, skipped=skipped) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/inbox.py` around lines 94 - 142, Persist inbox scan state
incrementally inside scan() instead of only at the end, so successfully
processed files are remembered even if a later put_source or propose_page call
fails. Update the state immediately after a file’s source/proposal is accepted
in the loop, using _save_state alongside _load_state and the existing
state[str(resolved)] tracking, while keeping ScanResult behavior unchanged. This
keeps scan() idempotent across partial failures and preserves deduplication for
already-processed files.
| def _h_list_pages(p: dict) -> list[dict]: | ||
| pages = filter_pages( | ||
| _store().list_pages(), | ||
| kind=p.get("type"), | ||
| equals=p.get("meta"), | ||
| before=p.get("meta_before"), | ||
| after=p.get("meta_after"), | ||
| ) | ||
| return [pg.model_dump(mode="json") for pg in pages] |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
kb.list_pages payload diverges from the MCP surface's contract.
_h_list_pages returns pg.model_dump(mode="json") (the full Page model), whereas server.py's kb_list_pages returns a curated {id, title, type, tags, metadata} dict for the same method name. This means the JSONL transport exposes strictly more data (likely full page body/content) for a "list" operation than MCP does, which is both a payload-size and information-exposure inconsistency across transports for an identically-named method.
♻️ Proposed fix to align with server.py's contract
def _h_list_pages(p: dict) -> list[dict]:
pages = filter_pages(
_store().list_pages(),
kind=p.get("type"),
equals=p.get("meta"),
before=p.get("meta_before"),
after=p.get("meta_after"),
)
- return [pg.model_dump(mode="json") for pg in pages]
+ return [
+ {"id": pg.id, "title": pg.title, "type": pg.type, "tags": pg.tags, "metadata": pg.metadata}
+ for pg in pages
+ ]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _h_list_pages(p: dict) -> list[dict]: | |
| pages = filter_pages( | |
| _store().list_pages(), | |
| kind=p.get("type"), | |
| equals=p.get("meta"), | |
| before=p.get("meta_before"), | |
| after=p.get("meta_after"), | |
| ) | |
| return [pg.model_dump(mode="json") for pg in pages] | |
| def _h_list_pages(p: dict) -> list[dict]: | |
| pages = filter_pages( | |
| _store().list_pages(), | |
| kind=p.get("type"), | |
| equals=p.get("meta"), | |
| before=p.get("meta_before"), | |
| after=p.get("meta_after"), | |
| ) | |
| return [ | |
| {"id": pg.id, "title": pg.title, "type": pg.type, "tags": pg.tags, "metadata": pg.metadata} | |
| for pg in pages | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/jsonl_server.py` around lines 251 - 259, The kb.list_pages JSONL
handler in _h_list_pages is returning the full Page model instead of matching
the MCP surface contract used by kb_list_pages. Update the return shape to a
curated dict with only the same fields as server.py’s kb_list_pages (for example
id, title, type, tags, metadata) and avoid exposing the full page body/content
through _store().list_pages() results.
| def load_webhooks(store: KBStore) -> list[Webhook]: | ||
| try: | ||
| loaded = yaml.safe_load(store.config_path.read_text(encoding="utf-8")) | ||
| except (OSError, yaml.YAMLError): | ||
| return [] | ||
| if not isinstance(loaded, dict): | ||
| return [] | ||
| raw = loaded.get("notify") | ||
| if not isinstance(raw, dict): | ||
| return [] | ||
| hooks: list[Webhook] = [] | ||
| for item in raw.get("webhooks") or []: | ||
| if not isinstance(item, dict) or not item.get("url"): | ||
| raise NotifyConfigError("notify.webhooks entries need a url") | ||
| events = item.get("events") | ||
| hooks.append( | ||
| Webhook( | ||
| url=_resolve_env(str(item["url"]), what="notify.webhooks.url"), | ||
| events=( | ||
| tuple(str(e) for e in events) if isinstance(events, list) else ALL_EVENTS | ||
| ), | ||
| backlog_threshold=( | ||
| int(item["backlog_threshold"]) | ||
| if item.get("backlog_threshold") is not None | ||
| else None | ||
| ), | ||
| age_threshold=( | ||
| str(item["age_threshold"]) if item.get("age_threshold") else None | ||
| ), | ||
| secret=( | ||
| _resolve_env(str(item["secret"]), what="notify.webhooks.secret") | ||
| if item.get("secret") | ||
| else None | ||
| ), | ||
| include_summary=bool(item.get("include_summary", False)), | ||
| ) | ||
| ) | ||
| return hooks |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Numeric config fields aren't validated consistently.
int(item["backlog_threshold"]) (line 106) has no error handling, unlike the url check above it which raises a clean NotifyConfigError. A typo like backlog_threshold: "many" in YAML will surface as a raw, unhandled ValueError instead of a consistent config error.
🛠️ Proposed fix
+def _parse_int(raw: object, *, what: str) -> int | None:
+ if raw is None:
+ return None
+ try:
+ return int(raw)
+ except (TypeError, ValueError) as e:
+ raise NotifyConfigError(f"{what} must be an integer") from e
+
+
def load_webhooks(store: KBStore) -> list[Webhook]:
...
- backlog_threshold=(
- int(item["backlog_threshold"])
- if item.get("backlog_threshold") is not None
- else None
- ),
+ backlog_threshold=_parse_int(
+ item.get("backlog_threshold"), what="notify.webhooks.backlog_threshold"
+ ),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def load_webhooks(store: KBStore) -> list[Webhook]: | |
| try: | |
| loaded = yaml.safe_load(store.config_path.read_text(encoding="utf-8")) | |
| except (OSError, yaml.YAMLError): | |
| return [] | |
| if not isinstance(loaded, dict): | |
| return [] | |
| raw = loaded.get("notify") | |
| if not isinstance(raw, dict): | |
| return [] | |
| hooks: list[Webhook] = [] | |
| for item in raw.get("webhooks") or []: | |
| if not isinstance(item, dict) or not item.get("url"): | |
| raise NotifyConfigError("notify.webhooks entries need a url") | |
| events = item.get("events") | |
| hooks.append( | |
| Webhook( | |
| url=_resolve_env(str(item["url"]), what="notify.webhooks.url"), | |
| events=( | |
| tuple(str(e) for e in events) if isinstance(events, list) else ALL_EVENTS | |
| ), | |
| backlog_threshold=( | |
| int(item["backlog_threshold"]) | |
| if item.get("backlog_threshold") is not None | |
| else None | |
| ), | |
| age_threshold=( | |
| str(item["age_threshold"]) if item.get("age_threshold") else None | |
| ), | |
| secret=( | |
| _resolve_env(str(item["secret"]), what="notify.webhooks.secret") | |
| if item.get("secret") | |
| else None | |
| ), | |
| include_summary=bool(item.get("include_summary", False)), | |
| ) | |
| ) | |
| return hooks | |
| def _parse_int(raw: object, *, what: str) -> int | None: | |
| if raw is None: | |
| return None | |
| try: | |
| return int(raw) | |
| except (TypeError, ValueError) as e: | |
| raise NotifyConfigError(f"{what} must be an integer") from e | |
| def load_webhooks(store: KBStore) -> list[Webhook]: | |
| try: | |
| loaded = yaml.safe_load(store.config_path.read_text(encoding="utf-8")) | |
| except (OSError, yaml.YAMLError): | |
| return [] | |
| if not isinstance(loaded, dict): | |
| return [] | |
| raw = loaded.get("notify") | |
| if not isinstance(raw, dict): | |
| return [] | |
| hooks: list[Webhook] = [] | |
| for item in raw.get("webhooks") or []: | |
| if not isinstance(item, dict) or not item.get("url"): | |
| raise NotifyConfigError("notify.webhooks entries need a url") | |
| events = item.get("events") | |
| hooks.append( | |
| Webhook( | |
| url=_resolve_env(str(item["url"]), what="notify.webhooks.url"), | |
| events=( | |
| tuple(str(e) for e in events) if isinstance(events, list) else ALL_EVENTS | |
| ), | |
| backlog_threshold=_parse_int( | |
| item.get("backlog_threshold"), what="notify.webhooks.backlog_threshold" | |
| ), | |
| age_threshold=( | |
| str(item["age_threshold"]) if item.get("age_threshold") else None | |
| ), | |
| secret=( | |
| _resolve_env(str(item["secret"]), what="notify.webhooks.secret") | |
| if item.get("secret") | |
| else None | |
| ), | |
| include_summary=bool(item.get("include_summary", False)), | |
| ) | |
| ) | |
| return hooks |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/notify.py` around lines 84 - 121, The numeric webhook config
parsing in load_webhooks is not handled consistently: converting
backlog_threshold directly with int(...) can raise a raw ValueError. Update the
webhook construction logic in load_webhooks so invalid backlog_threshold values
are caught and re-raised as NotifyConfigError with a clear message, matching the
existing validation style used for notify.webhooks entries and _resolve_env
fields.
| def deliver( | ||
| hook_url: str, | ||
| envelope: dict[str, object], | ||
| *, | ||
| secret: str | None = None, | ||
| timeout: float = DEFAULT_TIMEOUT, | ||
| ) -> bool: | ||
| """POST the envelope; log-and-swallow every failure. Returns success.""" | ||
| body = json.dumps(envelope, sort_keys=True).encode("utf-8") | ||
| headers = {"Content-Type": "application/json", "User-Agent": "vouch-notify"} | ||
| if secret: | ||
| headers[SIGNATURE_HEADER] = hmac.new( | ||
| secret.encode("utf-8"), body, hashlib.sha256 | ||
| ).hexdigest() | ||
| req = urllib.request.Request(hook_url, data=body, headers=headers, method="POST") | ||
| try: | ||
| with urllib.request.urlopen(req, timeout=timeout) as resp: | ||
| ok = 200 <= resp.status < 300 | ||
| except (urllib.error.URLError, OSError, ValueError) as e: | ||
| logger.warning("notify delivery to %s failed: %s", hook_url, e) | ||
| return False | ||
| if not ok: | ||
| logger.warning("notify delivery to %s got HTTP %s", hook_url, resp.status) | ||
| return ok | ||
|
|
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
python3 - <<'PY'
import urllib.request, tempfile, pathlib, os, json
# probe file:// behavior
with tempfile.NamedTemporaryFile(delete=False) as f:
f.write(b"hello")
path = pathlib.Path(f.name).resolve()
url = path.as_uri()
print("URL:", url)
req = urllib.request.Request(url, method="GET")
try:
with urllib.request.urlopen(req, timeout=5) as resp:
print("TYPE:", type(resp).__name__)
print("HAS_STATUS:", hasattr(resp, "status"))
try:
print("STATUS:", resp.status)
except Exception as e:
print("STATUS_ERR:", type(e).__name__, e)
try:
print("READ:", resp.read(20))
except Exception as e:
print("READ_ERR:", type(e).__name__, e)
except Exception as e:
print("ERR:", type(e).__name__, e)
finally:
os.unlink(path)
PYRepository: vouchdev/vouch
Length of output: 243
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n## notify.py\n'
sed -n '1,220p' src/vouch/notify.py
printf '\n## fetch.py\n'
if [ -f src/vouch/fetch.py ]; then
sed -n '1,260p' src/vouch/fetch.py
else
echo 'src/vouch/fetch.py not found'
fiRepository: vouchdev/vouch
Length of output: 12984
🏁 Script executed:
python3 - <<'PY'
import urllib.request, tempfile, pathlib, os
with tempfile.NamedTemporaryFile(delete=False) as f:
f.write(b'hello')
path = pathlib.Path(f.name).resolve()
url = path.as_uri()
print("URL:", url)
for method, data in [("GET", None), ("POST", b'{}')]:
req = urllib.request.Request(url, data=data, method=method)
try:
with urllib.request.urlopen(req, timeout=5) as resp:
print(method, "TYPE:", type(resp).__name__)
print(method, "STATUS:", getattr(resp, "status", "<missing>"))
print(method, "READ:", resp.read(20))
except Exception as e:
print(method, "ERR:", type(e).__name__, e)
os.unlink(path)
PYRepository: vouchdev/vouch
Length of output: 298
Restrict webhook URLs to http/https
deliver() passes hook_url straight to urllib.request.urlopen(), and file:// URLs are accepted there even for POST, which can read local files. Mirror fetch.py’s scheme allowlist before dispatching.
🧰 Tools
🪛 ast-grep (0.44.0)
[info] 131-131: use jsonify instead of json.dumps for JSON output
Context: json.dumps(envelope, sort_keys=True)
Note: [CWE-116] Improper Encoding or Escaping of Output.
(use-jsonify)
[warning] 139-139: Request-controlled URL passed to urlopen; validate against an allowlist to prevent SSRF.
Context: urllib.request.urlopen(req, timeout=timeout)
Note: [CWE-918] Server-Side Request Forgery (SSRF).
(urlopen-unsanitized-data)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/notify.py` around lines 124 - 148, The deliver() webhook sender
currently forwards hook_url directly into urllib.request.urlopen(), which can
allow गैर-http schemes like file://; add the same scheme allowlist used in
fetch.py before building the Request or calling urlopen so only http and https
URLs are accepted. Update deliver() to validate hook_url early, log-and-return
False on disallowed schemes, and keep the rest of the POST flow unchanged.
| def sweep(store: KBStore, *, now: datetime | None = None) -> list[str]: | ||
| """Evaluate triggers against the pending queue and fire subscribed hooks. | ||
|
|
||
| Idempotent per (event, proposal id): a re-run fires nothing new unless | ||
| the queue changed. Returns the event names fired (with repeats per hook). | ||
| """ | ||
| hooks = load_webhooks(store) | ||
| if not hooks: | ||
| return [] | ||
| now = now or datetime.now(UTC) | ||
| pending = store.list_proposals(ProposalStatus.PENDING) | ||
| pending_ids = [p.id for p in pending] | ||
| state = _load_state(store.kb_dir) | ||
| fired: list[str] = [] | ||
|
|
||
| for hook in hooks: | ||
| if EVENT_CREATED in hook.events: | ||
| new_ids = [pid for pid in pending_ids if pid not in state["created"]] | ||
| if new_ids: | ||
| summaries = ( | ||
| [ | ||
| str(p.payload.get("title") or p.payload.get("text") or "")[:120] | ||
| for p in pending | ||
| if p.id in new_ids | ||
| ] | ||
| if hook.include_summary | ||
| else None | ||
| ) | ||
| if deliver( | ||
| hook.url, | ||
| _envelope( | ||
| store, EVENT_CREATED, | ||
| proposal_ids=new_ids, pending_count=len(pending), | ||
| now=now, summaries=summaries, | ||
| ), | ||
| secret=hook.secret, | ||
| ): | ||
| fired.append(EVENT_CREATED) | ||
|
|
||
| if ( | ||
| EVENT_BACKLOGGED in hook.events | ||
| and hook.backlog_threshold is not None | ||
| and len(pending) >= hook.backlog_threshold | ||
| ): | ||
| marker = f"{hook.url}:{hook.backlog_threshold}" | ||
| if marker not in state["backlogged"] and deliver( | ||
| hook.url, | ||
| _envelope( | ||
| store, EVENT_BACKLOGGED, | ||
| proposal_ids=pending_ids, pending_count=len(pending), now=now, | ||
| ), | ||
| secret=hook.secret, | ||
| ): | ||
| fired.append(EVENT_BACKLOGGED) | ||
| state["backlogged"].append(marker) | ||
|
|
||
| if EVENT_AGED in hook.events and hook.age_threshold: | ||
| cutoff = parse_since(hook.age_threshold, now=now) | ||
| aged_ids = [ | ||
| p.id | ||
| for p in pending | ||
| if cutoff is not None | ||
| and ( | ||
| p.proposed_at.replace(tzinfo=UTC) | ||
| if p.proposed_at.tzinfo is None | ||
| else p.proposed_at | ||
| ) | ||
| < cutoff | ||
| and p.id not in state["aged"] | ||
| ] | ||
| if aged_ids and deliver( | ||
| hook.url, | ||
| _envelope( | ||
| store, EVENT_AGED, | ||
| proposal_ids=aged_ids, pending_count=len(pending), now=now, | ||
| ), | ||
| secret=hook.secret, | ||
| ): | ||
| fired.append(EVENT_AGED) | ||
| state["aged"].extend(aged_ids) | ||
|
|
||
| # created-state is hook-independent: a proposal counts as announced once | ||
| # any hook accepted it. drop decided proposals so the state can't grow | ||
| # without bound. | ||
| if any(e == EVENT_CREATED for e in fired): | ||
| state["created"] = sorted(set(state["created"]) | set(pending_ids)) | ||
| state["created"] = [pid for pid in state["created"] if pid in pending_ids] | ||
| state["aged"] = [pid for pid in state["aged"] if pid in pending_ids] | ||
| if not pending: | ||
| state["backlogged"] = [] | ||
| _save_state(store.kb_dir, state) | ||
| return fired |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
Multi-webhook created state is hook-independent — a failing hook silently loses its retry window.
Per the comment at lines 270-272, a proposal is marked "created" globally once any hook accepts delivery of EVENT_CREATED. With two configured webhooks, if hook A succeeds and hook B's deliver() fails in the same sweep (dead endpoint, transient network error), pending_ids still gets folded into state["created"] at line 274 because fired contains at least one EVENT_CREATED entry. Hook B will never be retried for those proposal ids — even though its own delivery attempt failed — because the state doesn't track per-hook delivery outcomes.
This contradicts the module's "best-effort... dead endpoint can never wedge a sweep" framing for delivery, since here a dead endpoint permanently loses (not just delays) that notification once a sibling hook succeeds. No test exercises multiple webhooks with mixed success/failure to catch this.
Consider keying state["created"]/state["aged"] per-hook (e.g. dict[hook_url, list[str]]) so each webhook independently tracks and retries its own undelivered events.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/notify.py` around lines 189 - 280, The sweep() logic in notify.py
treats EVENT_CREATED as a single global state, so one webhook’s success can
clear retry eligibility for another webhook that failed. Update sweep() and its
state handling so created/aged delivery tracking is per-hook (for example keyed
by hook.url) instead of shared across all hooks, and only mark proposal ids
delivered for the specific hook after that hook’s deliver() succeeds. Preserve
the existing idempotent behavior, but make _load_state/_save_state and the
created/aged state updates reflect each webhook’s independent delivery outcome.
| def _merge_page_kinds(store: KBStore, kinds: dict[str, dict[str, object]]) -> list[str]: | ||
| """Add missing `page_kinds` entries to config.yaml; return the names added. | ||
|
|
||
| When the file has no `page_kinds` key at all the block is appended | ||
| textually so operator comments and formatting survive. When the key | ||
| exists the file is parsed, missing kinds are filled in, and the mapping | ||
| is re-dumped (a yaml round-trip drops comments — the narrow price of a | ||
| real merge). | ||
| """ | ||
| path = store.kb_dir / "config.yaml" | ||
| text = path.read_text(encoding="utf-8") if path.exists() else "" | ||
| try: | ||
| loaded = yaml.safe_load(text) or {} | ||
| except yaml.YAMLError as e: | ||
| raise ValueError(f"config.yaml is not valid yaml; fix it first: {e}") from e | ||
| if not isinstance(loaded, dict): | ||
| raise ValueError("config.yaml must be a yaml mapping") | ||
|
|
||
| existing = loaded.get("page_kinds") | ||
| if not isinstance(existing, dict): | ||
| block = yaml.safe_dump( | ||
| {"page_kinds": kinds}, sort_keys=False, default_flow_style=False | ||
| ) | ||
| prefix = text.rstrip("\n") | ||
| joined = f"{prefix}\n\n{block}" if prefix else block | ||
| path.write_text(joined, encoding="utf-8") | ||
| return sorted(kinds) | ||
|
|
||
| added = sorted(k for k in kinds if k not in existing) | ||
| if not added: | ||
| return [] | ||
| for name in added: | ||
| existing[name] = kinds[name] | ||
| path.write_text( | ||
| yaml.safe_dump(loaded, sort_keys=False, default_flow_style=False), | ||
| encoding="utf-8", | ||
| ) | ||
| return added |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Map the file and inspect the relevant function with line numbers.
ast-grep outline src/vouch/onboarding.py --view expanded || true
printf '\n--- relevant slice ---\n'
sed -n '380,500p' src/vouch/onboarding.py | cat -n
printf '\n--- search for page_kinds handling ---\n'
rg -n "page_kinds|_merge_page_kinds|config.yaml" src/vouch/onboarding.py src -SRepository: vouchdev/vouch
Length of output: 13086
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n--- page_kinds reader ---\n'
sed -n '220,280p' src/vouch/page_kinds.py | cat -n
printf '\n--- any validation helpers mentioning page_kinds ---\n'
rg -n "page_kinds.*must|page_kinds.*mapping|load_page_kind_registry|validate_page|PageKindError" src/vouch/page_kinds.py src/vouch -SRepository: vouchdev/vouch
Length of output: 4630
🏁 Script executed:
python3 - <<'PY'
import yaml
doc = "page_kinds: old\n\npage_kinds:\n new: {}\n"
print(yaml.safe_load(doc))
PYRepository: vouchdev/vouch
Length of output: 180
Treat malformed page_kinds as an error
A present-but-non-mapping page_kinds currently goes down the same path as a missing key and appends a second top-level page_kinds: block. On reload, the later block wins and the original value is dropped, so fail fast instead of silently shadowing bad config.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vouch/onboarding.py` around lines 429 - 466, The _merge_page_kinds helper
in onboarding.py is treating a present-but-invalid page_kinds value the same as
a missing key, which can silently append a second top-level block and shadow the
original config. Update the page_kinds handling so that only a missing key
triggers the append/merge path, and any existing non-dict value raises a
ValueError instead. Keep the fix inside _merge_page_kinds, using the existing
loaded/existing checks and preserving the current merge behavior for valid
mappings.
ruff flagged the two host-wrapper transcript entries at over 100 columns (e501); reformat only, no behaviour change.
There was a problem hiding this comment.
Pull request overview
This PR turns a vouch KB into a “company brain” workflow while preserving (and in one place tightening) the review gate: it introduces config-declared typed page kinds, adds read-only viewports for listing/querying pages and generating a reviewer digest, and adds operator-run intake/notification utilities that never reach approval paths.
Changes:
- Add the company-brain onboarding template (
vouch init --template company-brain) that merges typedpage_kindsinto config idempotently and seeds an approved guide page. - Add read-only viewports: frontmatter filters for
kb.list_pages/vouch pages, pluskb.digest/vouch digestas a reviewer briefing. - Add intake + proactivity plumbing:
vouch source fetch,vouch inbox,vouch notify, plus adapter slash commands/skills and capture title improvements.
Reviewed changes
Copilot reviewed 41 out of 43 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_page_kinds.py | Tests for YAML date scalars in string schema + protected self-approval behavior |
| tests/test_page_filters.py | Tests for page filters across CLI/MCP/JSONL surfaces |
| tests/test_openclaw_plugin_manifest.py | Extends plugin manifest skill-name expectations |
| tests/test_onboarding.py | Tests for template registration, idempotence, and config merge behavior |
| tests/test_notify.py | Tests for webhook sweep semantics, signing, idempotence, retries |
| tests/test_install_adapter.py | Verifies new Claude Code command files are installed/idempotent |
| tests/test_inbox.py | Tests inbox scan/watch behavior and approve-path guardrails |
| tests/test_fetch.py | Tests URL snapshot safety rules (scheme, redirects, cap, private-host guard) |
| tests/test_digest.py | Tests digest composition, formatting, and read-only guarantee |
| tests/test_capture.py | Tests transcript-based session title extraction and fallbacks |
| tests/test_brain_commands.py | Guards for “never approve” instructions + skills existence + intake no-approve imports |
| src/vouch/server.py | Adds kb.digest; extends kb.list_pages with filters + metadata |
| src/vouch/proposals.py | Enforces protected page kinds against trusted-agent self-approval |
| src/vouch/page_kinds.py | Adds protected-kind flag + accepts YAML date/datetime scalars for type: string |
| src/vouch/page_filters.py | New shared filter implementation for page listings |
| src/vouch/onboarding.py | Adds company-brain template, config merge logic, seeded guide artifacts |
| src/vouch/notify.py | New outbound webhook notifier (sweep/test, env secrets, HMAC signing) |
| src/vouch/jsonl_server.py | Registers kb.digest; adds list_pages filters in JSONL handler |
| src/vouch/inbox.py | New inbox importer (file drop → source + pending page proposal) |
| src/vouch/fetch.py | New URL snapshotter into content-addressed sources |
| src/vouch/digest.py | New read-only digest builder + text/markdown renderers |
| src/vouch/cli.py | Adds init templates, digest/pages/source fetch/inbox/notify commands; schema list shows protected |
| src/vouch/capabilities.py | Registers kb.digest capability |
| package.json | Bumps OpenClaw packaging version |
| openclaw.plugin.json | Updates plugin manifest fields and skills root |
| examples/decision-log/README.md | Updates example narrative to “decision evolution” framing |
| docs/img/examples/render.py | Updates rendered example commands/args |
| docs/img/examples/decision-log-search.svg | Updated rendered example output |
| docs/img/examples/decision-log-diff.svg | Updated rendered diff example output |
| docs/company-brain.md | New documentation for the workflow, tools, and constraints |
| CHANGELOG.md | Documents new workflow features under Unreleased |
| adapters/openclaw/skills/vouch-standup/SKILL.md | New OpenClaw skill spec |
| adapters/openclaw/skills/vouch-remember/SKILL.md | New OpenClaw skill spec |
| adapters/openclaw/skills/vouch-record/SKILL.md | New OpenClaw skill spec |
| adapters/openclaw/skills/vouch-followup/SKILL.md | New OpenClaw skill spec |
| adapters/openclaw/skills/vouch-ask/SKILL.md | New OpenClaw skill spec |
| adapters/claude-code/install.yaml | Installs the new Claude Code commands |
| adapters/claude-code/.claude/commands/vouch-standup.md | New Claude Code slash command |
| adapters/claude-code/.claude/commands/vouch-remember.md | New Claude Code slash command |
| adapters/claude-code/.claude/commands/vouch-record.md | New Claude Code slash command |
| adapters/claude-code/.claude/commands/vouch-followup.md | New Claude Code slash command |
| adapters/claude-code/.claude/commands/vouch-ask.md | New Claude Code slash command |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| existing = loaded.get("page_kinds") | ||
| if not isinstance(existing, dict): | ||
| block = yaml.safe_dump( | ||
| {"page_kinds": kinds}, sort_keys=False, default_flow_style=False | ||
| ) | ||
| prefix = text.rstrip("\n") | ||
| joined = f"{prefix}\n\n{block}" if prefix else block | ||
| path.write_text(joined, encoding="utf-8") | ||
| return sorted(kinds) |
| def _h_list_pages(p: dict) -> list[dict]: | ||
| pages = filter_pages( | ||
| _store().list_pages(), | ||
| kind=p.get("type"), | ||
| equals=p.get("meta"), | ||
| before=p.get("meta_before"), | ||
| after=p.get("meta_after"), | ||
| ) | ||
| return [pg.model_dump(mode="json") for pg in pages] |
| return [ | ||
| {"id": p.id, "title": p.title, "type": p.type, "tags": p.tags} | ||
| for p in _store().list_pages() | ||
| {"id": p.id, "title": p.title, "type": p.type, "tags": p.tags, "metadata": p.metadata} | ||
| for p in pages | ||
| ] |
| A content-hash-keyed seen-state sidecar (`.vouch/inbox-state.json`) makes | ||
| re-runs cheap and idempotent: an unchanged file is skipped, an edited file | ||
| re-proposes. The watch loop is a bounded stdlib poll (the `vault_sync` | ||
| precedent) — no daemon, no watchdog dependency. |
| for hook in hooks: | ||
| if EVENT_CREATED in hook.events: | ||
| new_ids = [pid for pid in pending_ids if pid not in state["created"]] | ||
| if new_ids: |
turns a vouch kb into a team memory that stays behind the review gate. every addition is a viewport or convention over machinery that already exists: records are config-declared page kinds, structured queries are filters over list_pages, proactivity is a one-shot cli in operator cron, and the llm stays host-side in the adapters. nothing new can reach proposals.approve() — the one gate change in here (protected kinds) tightens it.
what's in the branch:
page_kindsconfig, plus a cited guide page. merging into config.yaml is additive and idempotent; operator-declared kinds always win.vouch init --template company-brain.vouch init --templatewiring for the onboarding template registry (templates layer on top of the starter seed).vouch pageshuman mirror. deliberately not a query language.vouch digest: read-only reviewer briefing (pending oldest-first, recent decisions, stale claims, followups due, citation coverage), registered at all four sites. writes nothing; safe to run from cron.vouch source fetch <url>: snapshot a url's exact bytes as a content-addressed source so claims cite immutable evidence. http/https only, every redirect hop re-validated, private-network hosts refused, 2 mib cap, fetched_at recorded in source metadata.vouch inbox --dir <path> [--watch]: each dropped .md/.txt becomes a registered source plus one pending page proposal citing it — the capture.finalize shape with a file drop as the trigger. content-hash seen-state makes re-runs idempotent; the watch loop is a bounded stdlib poll, no daemon.vouch notify sweep|test: config-declared reviewer webhooks (proposal.created, queue.backlogged, proposal.aged) with optional hmac-sha256 signing and env: secret refs. read-and-notify only, idempotent per (event, proposal), best-effort delivery that can never wedge a sweep.page_kinds.<kind>.protected: trueexempts a kind from the trusted-agent self-approval opt-out, so policy-bearing pages (voice, decision-record in the template) always need a reviewer other than the proposer.due_at: 2026-07-01loads as a date object from cli --meta parsing and from every frontmatter disk round-trip, so pages that validated at propose used to fail re-validation at approve.session summary: <project> (<uuid>). files/date fallback when no transcript survives; the uuid stays in the body.testing: full gate green — 871 passed, 14 skipped, mypy clean, ruff clean. new suites: test_page_filters, test_digest, test_fetch, test_inbox, test_notify, test_brain_commands (including ast guards asserting the intake modules have no import or attribute path to approve()), plus extended onboarding, page-kinds, capture, and install-adapter tests. also exercised end-to-end through the real cli: init --template, propose a followup, approve, filter pages, run the digest.
docs: docs/company-brain.md covers the record conventions, the intake channels, webhooks, cron recipes, and the honest constraint that review throughput is the bottleneck by design. changelog updated under [Unreleased].
Summary by CodeRabbit
New Features
Bug Fixes