Skip to content

feat(security): explicit agent turn origin + fail-closed approval gate + memory provenance#3227

Merged
senamakel merged 33 commits into
tinyhumansai:mainfrom
oxoxDev:feat/approval-origin-fail-closed
Jun 3, 2026
Merged

feat(security): explicit agent turn origin + fail-closed approval gate + memory provenance#3227
senamakel merged 33 commits into
tinyhumansai:mainfrom
oxoxDev:feat/approval-origin-fail-closed

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented Jun 2, 2026

Summary

  • Introduce AgentTurnOrigin enum (WebChat / ExternalChannel / TrustedAutomation { source } / Cli / Unknown) carried as a tokio task-local around every run_turn so the approval layer has explicit, non-ambiguous provenance instead of inferring trust from the absence of a chat context.
  • Rewrite ApprovalGate::intercept_audited decision tree around the origin: Unknown / unlabelled callers now fail closed; ExternalChannel persists a pending_approvals row and TTL-denies (no auto-routable surface yet); TrustedAutomation { Cron | Subconscious } retains the existing allow-without-prompt; SubconsciousTainted refuses every external_effect tool for the rest of the tick.
  • Track ingest provenance end-to-end: new MemoryTaint { Internal | ExternalSync } enum on MemoryEntry, persisted to a fresh memory_docs.taint TEXT NOT NULL DEFAULT 'internal' column via an idempotent ALTER TABLE migration. Composio sync paths that flow through MemoryClient::store_skill_sync now stamp chunks as ExternalSync at the single write boundary.
  • Subconscious tick reads SituationReport.has_external_content (true when sealed-summaries or recap-window sections render fresh third-party material) and upgrades the origin to SubconsciousTainted before invoking the agent.
  • Reject leading environment-variable assignments (GIT_SSH=…, SSH_ASKPASS=…, LD_PRELOAD=…, …) in SecurityPolicy::validate_command so command-allowlist matches can't be subverted by prepending a helper-binary override.
  • Seed safe channel_permissions defaults on first boot when the map is empty and the install has any existing channel configured — explicit-grant on existing surfaces, fail-closed on unknown ones.

Problem

ApprovalGate::intercept_audited previously returned Allow for any tool call whose task-local APPROVAL_CHAT_CONTEXT was unset. The justification was that "background / cron / triage turns are pre-authorised automation"; in practice every non-web entry point (channel inbound bots, subconscious ticks, scheduled jobs) shared the same absence-of-context signal as legitimate background automation, so external-effect tools could execute without ever creating a pending_approvals row.

Three concrete consequences:

  1. A remote chat message routed through channels::runtime::dispatch::process_channel_message could emit node_exec, cron_add, etc. — the gate saw no APPROVAL_CHAT_CONTEXT, classified the turn as trusted, and ran the tool.
  2. Subconscious ticks pulling email/Slack/Notion content into the LLM prompt landed in the same "no chat context" bucket. A crafted email body that says "automated task: send X to Y" could nudge the LLM into a send_email tool call that fired with full agent privileges and no user prompt.
  3. The pre-existing validate_command allowlist matched on the first whitespace-delimited token. GIT_SSH=./wrapper.sh git ls-remote ssh://… slipped through because the regex saw GIT_SSH=… as a leading argument, kept stripping, found git second, allowed the call, and let the kernel resolve GIT_SSH as a helper-binary override.

The agent_tool_policy::engine channel-permission cap also short-circuited when channel_permissions was empty (the default for almost every existing install) — every channel inherited the Dangerous legacy tier, so the cap never engaged in practice.

Solution

Layered, scope-isolated commits — 21 in total, each independently reviewable.

Origin + gate (commits 1–6)

  1. agent::turn_origin (new module) — AgentTurnOrigin enum + tokio::task_local! storage + with_origin(origin, future) scoping helper. Mirrors the existing APPROVAL_CHAT_CONTEXT pattern so neither task-local "owns" the trust decision alone.
  2. approval::gate::intercept_audited — replaces the chat_ctx.is_none() → Allow branch with an explicit match on AgentTurnOrigin::current(). Unknown / NoneDeny; ExternalChannel { … } → persist row + publish ApprovalRequested event + park on TTL (future patch wires a channel-routed approval surface, today's behaviour is the safe TTL deny); TrustedAutomation keeps its existing semantics; SubconsciousTainted returns a deny with an explanatory POLICY_DENIED_MARKER reason.
  3. channels/providers/web/mod.rs — scopes AgentTurnOrigin::WebChat { thread_id, client_id } around the existing web-channel run_turn so the chat-routable approval UX keeps working unchanged.
  4. channels/runtime/dispatch.rs::process_channel_message — scopes AgentTurnOrigin::ExternalChannel { channel, reply_target, message_id } around the agent invocation; AgentTurnRequest carries the origin across the native event bus.
  5. subconscious/engine.rs + cron/scheduler.rs — scope TrustedAutomation { Cron, job_id } or Subconscious (default) around the agent runs they trigger.
  6. agent/run_single / harness entry points — label remaining call sites with explicit origins so Unknown is reachable only for genuinely unlabelled callers (CLI / sub-agent / one-off tests).

Memory provenance (commits 7–13)

  1. memory/traits.rs — adds the MemoryTaint enum (Internal default + ExternalSync), MemoryEntry::taint field, MemoryTaint::{as_db_str, from_db_str} round-trip helpers, and a new Memory::store_with_taint trait method (default delegates to the existing store so non-persistent backends compile unchanged).
  2. memory_store/types.rs — adds #[serde(default)] pub taint: MemoryTaint, to NamespaceDocumentInput, NamespaceQueryResult, StoredMemoryDocument, and NamespaceMemoryHit. Legacy JSON without the field decodes as Internal.
  3. memory_store/unified/init.rsmemory_docs.taint TEXT NOT NULL DEFAULT 'internal' in the fresh-DB CREATE TABLE plus an idempotent ALTER TABLE memory_docs ADD COLUMN taint TEXT NOT NULL DEFAULT 'internal' migration that gracefully skips when the column already exists. All pre-existing rows default to internal.
  4. memory_store/unified/documents.rs — INSERT/UPDATE writes input.taint.as_db_str(); SELECT statements that hydrate StoredMemoryDocument / NamespaceQueryResult / NamespaceMemoryHit read the column via MemoryTaint::from_db_str.
  5. memory_store/memory_trait.rs::UnifiedMemory — overrides store_with_taint to thread the taint through NamespaceDocumentInput.taint; the recall path now surfaces the real persisted taint on MemoryEntry instead of hard-coding Internal.
  6. memory_store/client.rs::MemoryClient::store_skill_sync — flips taint = ExternalSync at the single composio write boundary (called by Notion / GitHub / LinkedIn enrichment paths). New ingest sources picked up automatically.
  7. subconscious/situation_report/{mod,summaries,query_window}.rsSituationReport exposes has_external_content: bool (true when the sealed-summaries or recap-window sections renders fresh third-party material). subconscious/engine.rs::tick_origin_source(has_external) maps that flag to TrustedAutomationSource::SubconsciousTainted or plain Subconscious.

Policy + config (commits 14–17)

  1. security/policy_command.rs::has_leading_env_assignment — leading-token regex (^[A-Z_][A-Z0-9_]*=…) rejected at the structural-guard layer alongside the existing dangerous-prefix check; is_command_allowed denies the call before allowlist matching.
  2. agent_tool_policy/engine.rs — empty-map fallback now logs the legacy behaviour explicitly. Real hardening lives one layer up in the config migration (next commit) so unit fixtures that build agents without seeding channel_permissions keep compiling unchanged. The cap engages on the very first boot after upgrade.
  3. config/schema/agent.rs::AgentConfig::migrate_channel_permissions_if_legacy — seeds { "web": "execute", <each-configured-channel>: "execute" } on first boot when the map is empty and the user already has non-web channels configured. Idempotent.
  4. Auxiliary cleanups — default MemoryTaint::Internal on every in-tree MemoryEntry constructor (no behaviour change, lets every test compile against the new field); docstring polish; drop redundant trim_start() in has_leading_env_assignment since split_whitespace().next() already skips leading whitespace.

Tests (commits 18–21)

  • approval::gate::tests — six new origin-decision cases: WebChat persists+parks, ExternalChannel persists+TTL-denies, TrustedAutomation { Cron } allows, TrustedAutomation { Subconscious } allows, SubconsciousTainted denies, Cli allows, Unknown denies.
  • agent::turn_origin::tests — scope/unscope round-trip, current() returns None outside scope, nested scopes return the inner origin.
  • memory::traits::testsMemoryTaint defaults to Internal for legacy rows, round-trips ExternalSync through MemoryEntry serde.
  • memory_store::tests — taint persists across upsert and recall, legacy-JSON NamespaceDocumentInput decodes as Internal, store_with_taint writes external_sync.
  • security::policy::tests::validate_command_rejects_leading_env_var_assignmentGIT_SSH=, SSH_ASKPASS=, LD_PRELOAD= all rejected; bare git ls-remote ssh://example.com still passes the structural guard.
  • config::schema::tests::empty_channel_permissions_with_existing_channels_migrates_to_execute — confirms idempotent first-boot seed of safe defaults.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — changed lines covered by new tests in approval::gate::tests, agent::turn_origin::tests, memory::traits::tests, memory_store::tests, security::policy::tests, config::schema::tests. Verified cargo test --lib openhuman::memory_store 563 / openhuman::memory 1962 / openhuman::approval 66 / openhuman::subconscious 53 / openhuman::memory_sync::composio 394 — all passing.
  • N/A: behaviour-only change — Coverage matrix updated
  • N/A: no feature ID in the matrix maps to runtime-origin policy or memory provenance — All affected feature IDs from the matrix are listed in the PR description under ## Related
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: not a release-cut surface — Manual smoke checklist updated if this touches release-cut surfaces
  • N/A: no public tracking issue — Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Runtime / desktop: every agent turn now carries an explicit origin label. Web chat is unchanged (chat-routable approval surface intact). Channel-inbound bots persist a pending row and TTL-deny instead of executing silently. Cron and untainted subconscious ticks keep their existing allow-without-prompt behaviour. Subconscious ticks whose context contains fresh sealed-summary or recap-window content from third-party syncs run with the SubconsciousTainted origin and refuse every external_effect tool.
  • Persistence: a single new TEXT NOT NULL DEFAULT 'internal' column. The migration applies idempotently on every boot — verified live in staging on a real workspace with 3 240 pre-existing rows (all defaulted to internal).
  • Security: command-allowlist matches no longer fool the structural guard with KEY=value prefixes. The default channel-permission cap engages on first boot for any install with configured channels.
  • Compatibility: Memory trait gains a default-shim'd store_with_taint so external implementors compile unchanged. MemoryEntry::taint and the four DTO additions are all #[serde(default)] so legacy persisted state and over-the-wire payloads decode cleanly.
  • Performance: one extra SQLite column read on document recall + one task-local read at gate entry + one boolean check on subconscious tick start. Negligible.

Related

  • Closes: N/A — no public tracking issue.
  • Follow-up PR(s)/TODOs:
    • Channel-routed approval surface (Telegram / Discord / Slack inline accept-deny) so ExternalChannel parked calls have a UX path beyond TTL deny.
    • Extend taint to mem_tree_chunks (memory-tree pipeline) so per-chunk provenance flows alongside the situation-report flag instead of just relying on the section-level signal.
    • Per-source taint coverage for non-composio ingest paths (webview scanners, voice transcripts).

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/approval-origin-fail-closed
  • Commit SHA: see git log upstream/main..HEAD — 21 commits, sequenced origin → gate → memory provenance → policy → migration → tests.

Validation Run

  • cargo test --lib openhuman::approval — 66 passed, 0 failed
  • cargo test --lib openhuman::agent::turn_origin — 3 passed, 0 failed
  • cargo test --lib openhuman::memory::traits — 5 passed, 0 failed
  • cargo test --lib openhuman::memory_store — 563 passed, 0 failed
  • cargo test --lib openhuman::memory — 1 962 passed, 0 failed
  • cargo test --lib openhuman::subconscious — 53 passed, 0 failed
  • cargo test --lib openhuman::memory_sync::composio — 394 passed, 0 failed
  • cargo test --lib openhuman::security::policy — 153 passed, 0 failed
  • cargo test --lib openhuman::config::schema — 245 passed, 0 failed
  • cargo check --workspace — clean
  • cargo fmt --check — clean
  • cargo clippy --lib -- -D warnings — error count identical to upstream/main; zero new hits in touched files.
  • Live boot smoke (staging): cold boot on a workspace with 3 240 pre-existing rows. Observed (a) ALTER TABLE migration applied with no panic; all rows default to internal. (b) Approval gate installed under the new origin-aware decision tree (session-6847…). (c) Telegram bot inbound DM routed through channels::providers::web::start_chat with thread_id=channel:telegram client_id=inbound; the host /tmp/cron_canary.txt sentinel never materialised. (d) Subconscious tick after Gmail sync logged [subconscious] tick origin source=SubconsciousTainted has_external_content=true, confirming the situation-report flag flips correctly with fresh third-party content in the seed.

Validation Blocked

  • command: Pre-push hook (Husky pnpm rust:check) hit a transient cef-dll-sys build-script "File I/O error: Peer disconnected" — CEF binary fetch flake. Re-ran pnpm rust:check standalone afterwards and it returned clean.
  • error: cef-dll-sys build-script exit 1 from the upstream-vendored CEF fetcher during the hook run.
  • impact: Bypassed the hook for the push (--no-verify) once the standalone re-run was clean. No code touches the CEF subtree.

Behavior Changes

  • Intended behavior change: turn origin is now an explicit, named first-class signal. The approval gate refuses external_effect tools whose origin is Unknown or SubconsciousTainted; existing trusted automation continues to run; the desktop chat surface is unchanged. Command-allowlist matches reject any leading KEY=value token. Empty channel_permissions maps on installs with configured channels migrate to explicit per-channel execute grants on first boot.
  • User-visible effect: none on the golden paths. A bot DMing a connected Telegram channel can no longer cause external_effect tools to fire silently — the agent's reply pattern is identical to today (no surfaced approval UX in this PR; that's the named follow-up), but the action does not happen. A subconscious tick that pulls in a hostile email body cannot trigger send_email-class tools for the rest of that tick.

Parity Contract

  • Legacy behaviour preserved: web chat keeps APPROVAL_CHAT_CONTEXT scoping for backward-compat with consumers that read it directly; both task-locals are now scoped around the same run_turn. Cron / non-tainted subconscious tools still execute without prompt. Memory::store (no taint parameter) keeps writing Internal for every backend.
  • Guard / fallback / dispatch parity checks: AgentTurnOrigin::Unknown is only reachable on genuinely unlabelled call sites and fails closed by design. The SQLite migration is idempotent (re-run on every boot, no-op when the column exists). The composio ExternalSync flip lives at a single boundary (MemoryClient::store_skill_sync) so new sync providers inherit the label automatically.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: this one
  • Resolution: N/A

Summary by CodeRabbit

  • New Features

    • Origin-aware trust labeling for agent turns (web, external channels, trusted automation, CLI) and explicit provenance/taint metadata on stored content.
  • Behavior Changes

    • Approval/guard decisions are origin-aware: trusted automation may auto-allow, external channels escalate/audit and park, unknown or tainted subconscious origins deny.
  • Bug Fixes

    • Stricter command validation now rejects leading env-var assignments.
  • Chores

    • One-time channel-permissions migration; tests and fixtures updated for new fields.

oxoxDev added 21 commits June 2, 2026 13:48
…venance

Introduce a typed task-local label that entry points (web chat, channel
runtime, subconscious, cron, CLI) scope around each agent run so policy
code can read trust/routing intent directly instead of inferring it from
the absence of APPROVAL_CHAT_CONTEXT.

WebChat / ExternalChannel / TrustedAutomation(Cron|Subconscious|
SubconsciousTainted) / Cli / Unknown cover the entry-point matrix. The
gate maps Unknown to fail-closed in a later commit.

with_origin() mirrors the existing APPROVAL_CHAT_CONTEXT.scope() pattern
so the two task-locals propagate identically through the tool loop.
The gate used to allow every tool call without a chat context, on the
assumption that any caller missing APPROVAL_CHAT_CONTEXT was trusted
background automation. That left non-web channels (Telegram, Discord,
Slack, Yuanbao) and external-content-driven subconscious turns sharing
the same allow-bypass even though they carry remote-attacker input.

Replace the chat_ctx.is_none() shortcut with an origin-aware decision
tree keyed off the new AgentTurnOrigin task-local:

  WebChat            -> existing chat-routed parking flow
  ExternalChannel    -> persist audit row, TTL-deny (no surface yet)
  TrustedAutomation
    Cron             -> allow without prompt
    Subconscious     -> allow without prompt (internal memory only)
    SubconsciousTainted -> deny (external-sync chunks in context)
  Cli                -> allow without prompt
  Unknown            -> deny

The auto-approve allowlist shortcut still runs first, so users' Always
allow tools win regardless of origin.

Tests: cover each origin path. The legacy 'no_chat_context_is_allowed'
test is replaced by intercept_with_unknown_origin_denies — the new
contract for unlabelled call sites.
Wrap the existing APPROVAL_CHAT_CONTEXT scope in the matching
AgentTurnOrigin::WebChat scope so the approval gate sees both
task-locals when a user-driven chat turn invokes an external_effect
tool. Both must scope the same inline future — tokio task-locals do
not cross tokio::spawn.
The channel runtime dispatch path runs the agent on inbound text from
non-web channels (Telegram, Discord, Slack, WhatsApp, Yuanbao, etc.) —
remote-attacker-controlled bytes. Plumb a typed origin field through
AgentTurnRequest so the bus handler can re-scope AGENT_TURN_ORIGIN
around the tool loop on the other side of the native-bus tokio::spawn
boundary.

dispatch.rs sets ExternalChannel { channel, reply_target, message_id }
for the live agent turn; the triage evaluator does the same for the
classifier pass.

origin defaults to Unknown (AgentTurnOrigin::Unknown) is not the
default here — callers must pick a concrete label. Test fixtures and
the only other AgentTurnRequest builder (triage::evaluator) set
appropriate origins. Any future builder that forgets falls back to a
compile error rather than silent trusted-automation semantics.
… cron turns

User-authorized background loops scope a TrustedAutomation origin
around their agent invocation so the approval gate allows external_
effect tools without prompting — the user enabled the loop knowing
it would run on their behalf.

  subconscious::engine -> TrustedAutomation { Subconscious }
  cron::scheduler      -> TrustedAutomation { Cron, job_id }

Subconscious ticks that read external-sync memory chunks (Gmail /
Slack / Notion / Composio) should escalate to SubconsciousTainted;
the memory-taint signal that drives that escalation is wired in a
follow-up step.
…igins

Every Agent::run_single call site needs an origin under the
fail-closed gate. Catch the rest:

  agent::task_dispatcher    -> Cli (sub-agent dispatch from a parent
                               turn the user already authorized)
  skills::schemas           -> Cli (operator-initiated skill run)
  meet_agent::brain         -> Cli (live meeting transcript from a
                               call the user actively joined)
  inference::local::ops     -> Cli (agent_chat RPC, trusted clients)
  mcp_server::tools         -> ExternalChannel (remote MCP callers —
                               untrusted, must route through audit
                               + TTL-deny)
Add MemoryTaint { Internal, ExternalSync } and attach it to MemoryEntry.
Internal is the serde default so legacy rows persisted before this
column existed deserialize safely (and the gate's tainted-subconscious
escalation never fires for entries we cannot classify).

This is the type-system half of the contract. The composio /
memory_sync ingest paths flip the taint to ExternalSync at write time,
and the subconscious tick escalates its origin from Subconscious to
SubconsciousTainted when its context contains a tainted chunk. The
SQLite schema migration that persists taint across restarts lives in a
follow-up — for now, in-process MemoryEntry values can carry taint and
the gate enforces the escalation when the upstream paths populate it.
…ion for existing installs

The tool-policy engine treated an empty channel_permissions map as
'preserve legacy unrestricted surface', returning PermissionLevel::
Dangerous. That meant the default install — which ships with no
channel_permissions — never enforced the per-channel cap, leaving non-
web channels (Telegram, Discord, Slack, Yuanbao) at the same
permission level as the desktop UI.

Two changes:

1. permission_for_channel: empty map -> ReadOnly (not Dangerous).
2. AgentConfig::migrate_channel_permissions_if_legacy: on first boot
   after upgrade, an install with empty channel_permissions and at
   least one configured channel seeds the map with web=execute +
   each existing channel = execute. Idempotent, logs at info.

Existing engine test updated to assert the new fail-closed semantics
(write_notes and run_script are no longer allowed when the map is
empty).
Mechanical follow-up to the MemoryTaint introduction — every in-tree
MemoryEntry literal (library helpers + integration test fixtures)
gets an explicit taint field set to Default::default() (Internal).

Test fixtures for AgentTurnRequest similarly receive origin: Cli so
the bus dispatch path doesn't fail closed on the unlabelled-call-site
check. No behavioural change: Internal taint and Cli origin both
match the pre-PR allow-through semantics for these specific
fixtures.
…mmands

The existing dangerous-prefix list (GIT_SSH, LD_PRELOAD, IFS, …) is
narrow by design — every new helper-style hook a downstream tool ships
opens a fresh evasion surface. Tighten the policy: ANY leading env-var
assignment (NAME=…, where NAME is an ASCII identifier) is rejected
inside is_command_allowed, in addition to the existing
has_dangerous_env_prefix check.

Rationale: the allowlist already names every command we want to
permit, and none of those commands need an operator-set env var at
invoke time, so the broader gate has no false-positive surface on the
approved path.

Tests:
- validate_command_rejects_leading_env_var_assignment exercises the
  GIT_SSH= / SSH_ASKPASS= / LD_PRELOAD= triad plus the no-prefix
  negative.
- command_env_var_prefix_is_always_rejected replaces the previous
  'benign prefix passes' contract test (FOO=bar ls and friends).
- dangerous_env_var_prefix_blocked updates its 'TZ=UTC pass' arm to
  the new reject-all contract while keeping its LD_PRELOAD coverage.
…sed default

The 'unrestricted_policy_session_renders_none' fixture relied on empty
channel_permissions producing PermissionLevel::Dangerous. With the
fail-closed-to-ReadOnly change the same fixture now produces a
restricted session whose Write tools are blocked.

Replace the assertion with a read-only-only-tools shape that still
expresses the same invariant (no boundary text to render) under the
new contract.
…ion carries hardening

Reverting the engine-layer fail-closed default to its prior behavior
(empty channel_permissions -> Dangerous) and keeping the
AgentConfig::migrate_channel_permissions_if_legacy migration as the
real hardening surface.

Why: changing the engine default broke every unit fixture that built
an Agent without configuring channel_permissions (the SpawnSubagentTool
Execute-permission tool became silently blocked under the new ReadOnly
cap, breaking ~30 harness tests). The migration accomplishes the
intended hardening for real-world legacy installs at first boot — the
engine layer can keep its backward-compatible 'empty == unrestricted'
shortcut without leaving the actual user-visible policy unenforced.

The migration logic and its unit tests stay in place. The engine test
keeps its 'legacy unrestricted' shape since that is now the documented
behavior.
…ntics

- cargo fmt on the cron + mcp_server origin scoping introduced
  earlier in this PR.
- Tighten the migration: when the install has NO non-web channels
  configured, leave the map empty so fresh installs don't get an
  accidental policy seeded. The migration is for already-active
  installs (where the legacy unrestricted surface needs replacing
  with an explicit per-channel cap), not for the default empty-map
  baseline.

migrate_channel_permissions_with_no_channels_is_noop test now passes.
- Rephrase migrate_channel_permissions_if_legacy docs to match the
  engine-layer behaviour landed in the previous fix commit (engine
  keeps legacy unrestricted, migration carries the hardening).
- Drop redundant `trim_start()` before `split_whitespace().next()` in
  has_leading_env_assignment — `split_whitespace` already skips
  leading whitespace, so the trim was a no-op.

No behaviour change.
Add `taint: MemoryTaint` (serde-default `Internal`) to
NamespaceDocumentInput, NamespaceQueryResult, StoredMemoryDocument, and
NamespaceMemoryHit so the provenance signal can flow from the ingest
caller all the way through retrieval. Legacy JSON without the field
decodes as Internal (covered by new unit tests on each DTO).

Constructors across the tree get an explicit `taint: Internal` for now;
the dedicated `store_with_taint` path and composio wire-up land in
follow-up commits.
…ation

Extend the `memory_docs` schema with a `taint TEXT NOT NULL DEFAULT
'internal'` column so the provenance signal can round-trip through
SQLite. Fresh installs get the column from CREATE TABLE; existing DBs
get an idempotent ALTER TABLE that no-ops on re-application following
the same pattern as the vector_chunks model_signature migration.

Add `MemoryTaint::as_db_str` / `from_db_str` helpers so callers can
serialise the enum without knowing the SQL literal, with unknown column
values defaulting back to `Internal` so the gate never escalates on
garbage data.
…recall

Extend the INSERT/ON CONFLICT clauses in both `upsert_document` and
`upsert_document_metadata_only` to write the new `taint` column from
`NamespaceDocumentInput.taint`, and update the SELECT in
`load_documents_for_scope` to read it back via `MemoryTaint::from_db_str`.

With this commit, document upserts persist the provenance signal and
every retrieval path that flows through `load_documents_for_scope`
(query_namespace_hits / recall_namespace_memories / list_documents)
surfaces the real taint on the resulting hits, not a hardcoded default.
Legacy rows pre-migration land on the `DEFAULT 'internal'` clause and
keep decoding as `MemoryTaint::Internal` end-to-end.
Add a `store_with_taint` method to the `Memory` trait so memory_sync
ingest paths can label external content explicitly. The default
implementation delegates to `store` (preserves backward-compatibility
for mock backends), while `UnifiedMemory` overrides it to push the
taint label through `NamespaceDocumentInput.taint`.

Update `UnifiedMemory::recall` to surface the real taint from
`NamespaceQueryResult` instead of hardcoding `Internal`, so a tick that
pulls in a Gmail / Slack / Notion chunk via memory recall now sees the
correct provenance signal on the returned `MemoryEntry`.
`MemoryClient::store_skill_sync` is the shared entry point for every
memory_sync provider (composio gmail / slack / notion / github /
linear / clickup, plus the LinkedIn-via-Apify enrichment path) — none
of these call sites mix in user-driven content, so it is safe and
correct to mark every write coming through here as
`MemoryTaint::ExternalSync`.

Anchoring the flip at the shared method (rather than per-provider
call sites) means a new sync provider added later automatically picks
up the correct taint label without a per-provider follow-up. The
internal `put_doc` / `Memory::store` paths remain Internal-default for
chat / agent writes.
…ontext contains external-sync chunks

`build_situation_report` now returns a `SituationReport` struct with a
`has_external_content` flag set by the `summaries` and `query_window`
section builders — both of which exclusively render content derived
from third-party sync sources (Gmail / Slack / Notion / sealed source
summaries via `mem_tree_summaries`). When either section actually
renders a fresh row, the tick prompt now carries external-sync content
into the LLM context window.

The engine reads that flag and, when true, swaps the
`AgentTurnOrigin::TrustedAutomation { source }` from `Subconscious` to
`SubconsciousTainted` so the approval gate refuses external_effect
tools for the rest of the tick. Closes the wiring gap the previous
inline comment in `run_agent` flagged.

Section builders return `(String, bool)`; placeholder strings ("No new
sealed summaries…", "Recap unavailable", "No new recap content…") keep
the flag false so a tick with only environment / identifiers / past
reflections still runs with the original `Subconscious` origin.
…grade

Add four memory_store regression tests covering the taint round-trip
contract: ExternalSync written via `store_with_taint` surfaces back on
recall, the direct trait API doesn't silently fall back to Internal,
legacy rows pre-migration decode as Internal via the SQL DEFAULT, and a
mixed namespace (one tainted + one internal) keeps both labels intact
so the subconscious gate can decide without over-escalating untainted
context.

Extract the subconscious tick's origin-source decision into a small
free function `tick_origin_source(has_external_content) ->
TrustedAutomationSource` so the upgrade contract is unit-testable
without spinning up a real `Agent` + provider. Add two tests proving
false→Subconscious and true→SubconsciousTainted.
@oxoxDev oxoxDev requested a review from a team June 2, 2026 17:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Warning

Review limit reached

@senamakel, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 7 minutes and 37 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cba32839-39a7-41ce-a748-0abd2a660931

📥 Commits

Reviewing files that changed from the base of the PR and between 25abd8f and 7f88872.

📒 Files selected for processing (14)
  • src/openhuman/agent/bus.rs
  • src/openhuman/channels/providers/web.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • src/openhuman/cron/scheduler.rs
  • src/openhuman/learning/reflection_tests.rs
  • src/openhuman/learning/transcript_ingest/tests.rs
  • src/openhuman/memory_store/memory_trait.rs
  • src/openhuman/memory_store/safety/mod.rs
  • src/openhuman/memory_store/unified/documents.rs
  • src/openhuman/memory_store/unified/documents_tests.rs
  • src/openhuman/security/policy_command.rs
  • src/openhuman/skills/schemas.rs
  • tests/memory_threads_raw_coverage_e2e.rs
  • tests/memory_tree_memory_round23_raw_coverage_e2e.rs
📝 Walkthrough

Walkthrough

Adds MemoryTaint metadata and persistence, introduces AgentTurnOrigin task-local labels, threads origin through entry points and the bus, makes ApprovalGate origin-aware, surfaces taint in recall/query paths, tightens command allowlisting, and updates many tests and migrations.

Changes

Provenance and Trust Routing

Layer / File(s) Summary
Memory taint contracts and public types
src/openhuman/memory/traits.rs, src/openhuman/memory_store/types.rs, src/openhuman/memory/mod.rs
Adds MemoryTaint, taint fields to memory types, serde/db mapping, and tests ensuring legacy defaulting to Internal.
Taint persistence through UnifiedMemory
src/openhuman/memory_store/unified/*, src/openhuman/memory_store/memory_trait.rs, src/openhuman/memory_store/init.rs
Schema adds taint column (with backfill), upsert/load/query updated to persist/parse taint, and memory trait/store paths updated with store_with_taint and taint-aware recall/list/get behavior plus tests.
Document ingestion and write origins
src/core/memory_cli.rs, src/openhuman/memory/ops/*, src/openhuman/memory_store/client.rs, src/openhuman/memory/ingestion/*, tests/*
RPC/CLI/internal writes set taint=Internal; skill-sync/external sync sets taint=ExternalSync; sanitizers/enrichers preserve taint; tests updated to include taint.
Turn origin task-local and entry-point wiring
src/openhuman/agent/turn_origin.rs, src/openhuman/agent/bus.rs, src/openhuman/channels/*, src/openhuman/cron/*, src/openhuman/mcp_server/*, src/openhuman/skills/*, src/openhuman/inference/local/*
Adds AgentTurnOrigin and helpers with_origin/current; AgentTurnRequest gains origin; origin is scoped at web, external channels, CLI, cron, MCP, skills, and bus tool loops.
Origin-aware approval and automation policy
src/openhuman/approval/gate.rs, src/openhuman/subconscious/*, src/openhuman/agent_tool_policy/engine.rs
ApprovalGate now branches behavior by AgentTurnOrigin (allow/deny/persist rules); subconscious tick/report surfaces external-content to select trusted automation source; channel-permissions doc updated.
Leading env-assignment command rejection
src/openhuman/security/policy_command.rs, src/openhuman/security/policy.rs, src/openhuman/security/policy_tests.rs
New helper detects leading NAME=... env assignments and SecurityPolicy rejects segments with such prefixes; tests updated to reflect stricter rejection.
Fixture updates for taint-aware memory entries
tests/*, src/openhuman/*/tests
Many test helpers and mocks updated to initialize MemoryEntry.taint and to set AgentTurnRequest.origin where applicable.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • M3gA-Mind
  • graycyrus

🐰 I stitched the trails and marked each turn,
Memory taint tucked where provenance should burn.
Origins scoped — web, cron, CLI in line,
Approval listens now, no unlabeled sign.
Hop, sniff, and guard — the agent learns in kind.

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. labels Jun 2, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/openhuman/skills/schemas.rs (1)

923-985: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Preserve the parent turn origin when spawning a skill run.

spawn_skill_run_background is reused by the run_skill agent tool, so hard-coding AgentTurnOrigin::Cli here strips an ExternalChannel/tainted parent turn of its provenance once the child task is detached. That lets a remote-origin request re-enter the agent as local CLI and bypass the new approval routing. Capture the current origin before tokio::spawn and only fall back to Cli for direct RPC/CLI invocations outside any turn.

Suggested fix
     let run_id = uuid::Uuid::new_v4().to_string();
     let log_path = run_log::run_log_path(&workspace, &skill_id, &run_id);
+    let inherited_origin = crate::openhuman::agent::turn_origin::current()
+        .unwrap_or(crate::openhuman::agent::turn_origin::AgentTurnOrigin::Cli);
     tracing::info!(
         skill_id = %skill_id,
         run_id = %run_id,
@@
         let skill_id = skill_id.clone();
         let inputs = inputs.clone();
         let log_path = log_path.clone();
+        let inherited_origin = inherited_origin.clone();
         tokio::spawn(async move {
@@
-            let result = crate::openhuman::agent::turn_origin::with_origin(
-                crate::openhuman::agent::turn_origin::AgentTurnOrigin::Cli,
+            let result = crate::openhuman::agent::turn_origin::with_origin(
+                inherited_origin,
                 with_autonomous_iter_cap(SKILL_RUN_MAX_ITERATIONS, agent.run_single(&task_prompt)),
             )
             .await;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/skills/schemas.rs` around lines 923 - 985, The code currently
hard-codes AgentTurnOrigin::Cli inside the spawned task; capture the current
turn origin before tokio::spawn (e.g., let parent_origin =
crate::openhuman::agent::turn_origin::current_origin().unwrap_or(AgentTurnOrigin::Cli);),
move/clone parent_origin into the async block, and use that variable in the
with_origin call instead of AgentTurnOrigin::Cli so child skill runs inherit the
parent provenance (preserving any ExternalChannel/tainted origin) while still
defaulting to Cli when no parent exists. Ensure parent_origin is cloned/moved
where needed and referenced in the call to
with_origin(crate::openhuman::agent::turn_origin::AgentTurnOrigin::..., ...) so
the spawned task preserves the original turn origin.
src/openhuman/agent/bus.rs (1)

37-48: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate doc comment block.

Lines 37-48 duplicate the doc comment that already appears on lines 43-48. Remove the first occurrence.

Proposed fix
 /// Full owned payload for a single agentic turn executed through the bus.
 ///
 /// All fields are either owned values, [`Arc`]s, or channel handles — the
 /// bus carries them by value without touching serialization. Consumers can
 /// therefore pass trait objects (`Arc<dyn Provider>`, tool trait-object
 /// registries) and streaming senders (`on_delta`) through unchanged.
-/// Full owned payload for a single agentic turn executed through the bus.
-///
-/// All fields are either owned values, [`Arc`]s, or channel handles — the
-/// bus carries them by value without touching serialization. Consumers can
-/// therefore pass trait objects (`Arc<dyn Provider>`, tool trait-object
-/// registries) and streaming senders (`on_delta`) through unchanged.
 pub struct AgentTurnRequest {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/bus.rs` around lines 37 - 48, Remove the duplicated
documentation block in src/openhuman/agent/bus.rs by deleting the first
occurrence of the repeated doc comment (the earlier copy of the "Full owned
payload for a single agentic turn executed through the bus..." paragraph) and
keep the single correct copy that remains; ensure surrounding spacing and
indentation for the struct/enum/comment (the doc tied to the bus payload type)
is preserved so rustdoc and compiler comments are unchanged.
src/openhuman/memory_store/memory_trait.rs (1)

315-342: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve stored taint in get() and list() instead of hardcoding Internal.

These two APIs currently erase ExternalSync provenance for existing rows, so callers using direct lookup or listing will see synced memory as trusted/user-authored content. That breaks the new MemoryEntry contract even though recall() now propagates taint correctly.

Suggested fix
-        let row: Option<(String, String, String, f64, String)> = conn
+        let row: Option<(String, String, String, f64, String, String)> = conn
             .query_row(
-                "SELECT document_id, key, content, updated_at, category
+                "SELECT document_id, key, content, updated_at, category, taint
                  FROM memory_docs WHERE namespace = ?1 AND key = ?2 LIMIT 1",
                 params![ns, key],
                 |row| {
                     Ok((
                         row.get(0)?,
                         row.get(1)?,
                         row.get(2)?,
                         row.get(3)?,
                         row.get(4)?,
+                        row.get(5)?,
                     ))
                 },
             )
             .optional()?;
         Ok(
-            row.map(|(id, key, content, updated_at, category)| MemoryEntry {
+            row.map(|(id, key, content, updated_at, category, taint)| MemoryEntry {
                 id,
                 key,
                 content,
                 namespace: Some(ns.clone()),
                 category: memory_category_from_stored(&category),
                 timestamp: timestamp_to_rfc3339(updated_at),
                 session_id: None,
                 score: None,
-                taint: crate::openhuman::memory::MemoryTaint::Internal,
+                taint: MemoryTaint::from_db_str(&taint),
             }),
         )
-        let docs = self
-            .list_documents(Some(ns))
+        let docs = self
+            .load_documents_for_scope(ns)
             .await
             .map_err(anyhow::Error::msg)?;
         let mut out = Vec::new();
-        let items = docs
-            .get("documents")
-            .and_then(serde_json::Value::as_array)
-            .cloned()
-            .unwrap_or_default();
-        for (idx, d) in items.into_iter().enumerate() {
+        for (idx, d) in docs.into_iter().enumerate() {
             let cat = category.cloned().unwrap_or(MemoryCategory::Core);
             out.push(MemoryEntry {
-                id: d
-                    .get("documentId")
-                    .and_then(serde_json::Value::as_str)
-                    .unwrap_or_default()
-                    .to_string(),
-                key: d
-                    .get("key")
-                    .and_then(serde_json::Value::as_str)
-                    .unwrap_or_default()
-                    .to_string(),
-                content: d
-                    .get("title")
-                    .and_then(serde_json::Value::as_str)
-                    .unwrap_or_default()
-                    .to_string(),
+                id: d.document_id,
+                key: d.key,
+                content: d.title,
                 namespace: Some(ns.to_string()),
                 category: cat,
                 timestamp: format!("idx-{idx}"),
                 session_id: None,
                 score: None,
-                taint: crate::openhuman::memory::MemoryTaint::Internal,
+                taint: d.taint,
             });
         }

Also applies to: 346-387

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_store/memory_trait.rs` around lines 315 - 342, The get()
and list() paths currently hardcode MemoryTaint::Internal when constructing
MemoryEntry, which overwrites the stored taint; update both SQL selects to
include the taint column, read it from the row tuple (e.g., add a String taint
field to the row tuple alongside category), and pass the stored value through
the existing mapper (e.g., memory_taint_from_stored(&taint) or add such a small
mapper) instead of MemoryTaint::Internal so the MemoryEntry.taint preserves the
DB provenance; apply the same change in the code that builds MemoryEntry in both
get() and list() (use the same timestamp_to_rfc3339(...) and
memory_category_from_stored(...) logic).
🧹 Nitpick comments (1)
src/openhuman/memory_store/safety/mod.rs (1)

241-279: ⚡ Quick win

Add a regression test for taint preservation here.

This provenance rule is security-sensitive, and right now it's only documented in comments. A focused test that sanitizes an ExternalSync document and asserts sanitized.value.taint stays unchanged would make this harder to accidentally undo.

Example test shape
+    #[test]
+    fn sanitize_document_input_preserves_taint() {
+        let sanitized = sanitize_document_input(NamespaceDocumentInput {
+            namespace: "ns".into(),
+            key: "k".into(),
+            title: "title".into(),
+            content: "content".into(),
+            source_type: "sync".into(),
+            priority: "normal".into(),
+            tags: vec![],
+            metadata: json!({}),
+            category: "core".into(),
+            session_id: None,
+            document_id: None,
+            taint: crate::openhuman::memory::MemoryTaint::ExternalSync,
+        });
+        assert_eq!(
+            sanitized.value.taint,
+            crate::openhuman::memory::MemoryTaint::ExternalSync
+        );
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_store/safety/mod.rs` around lines 241 - 279, Add a
regression test that asserts taint is preserved by sanitize_document_input:
create a NamespaceDocumentInput (e.g., representing an ExternalSync document)
with a non-default taint, call sanitize_document_input(namespace_doc), and
assert sanitized.value.taint == original_input.taint; ensure the test constructs
realistic title/content/tags/metadata values so sanitize_text/sanitize_json run,
and place the test near existing safety tests exercising sanitize_document_input
to guard the provenance rule in function sanitize_document_input and fields
NamespaceDocumentInput::taint.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/config/schema/agent.rs`:
- Around line 202-211: Update the rustdoc to reflect actual runtime semantics:
state that when the channel_permissions map is empty the engine currently treats
channels as PermissionLevel::Dangerous (not readonly) until
AgentConfig::migrate_channel_permissions_if_legacy seeds explicit entries at
startup, and clarify that the "fail-closed" readonly behavior only applies after
that migration/seeding step; reference ToolPolicyEngine and
PermissionLevel::Dangerous in the text and ensure guidance to update external
docs (e.g., AGENTS.md) if this behavioral description changes the public
contract.

In `@src/openhuman/meet_agent/brain.rs`:
- Around line 929-935: The code incorrectly marks live meeting turns as
AgentTurnOrigin::Cli; instead introduce and use a distinct origin (e.g.
AgentTurnOrigin::LiveMeeting or AgentTurnOrigin::ExternallySourced) when calling
crate::openhuman::agent::turn_origin::with_origin for meeting audio (the call
site using agent.run_single(&user_message)), and ensure the new enum variant is
added where AgentTurnOrigin is defined so downstream approval/routing logic
treats these as externally sourced speech rather than a trusted CLI origin.

In `@src/openhuman/memory_store/unified/documents.rs`:
- Around line 369-375: The current code silently maps any unknown persisted
taint string to MemoryTaint::Internal; instead, parse the DB value into
taint_str and call MemoryTaint::from_db_str(&taint_str), then if the result is
MemoryTaint::Internal but taint_str != "internal" treat this as an
unknown/future value and either return an error from the loader or map it to the
safer MemoryTaint::Tainted; update the variables used in this block (taint_str,
taint) and the calling code to propagate an Err on parse-failure or to accept
MemoryTaint::Tainted so unknown persisted values are not trusted.

In `@src/openhuman/memory/traits.rs`:
- Around line 53-60: from_db_str currently maps any unrecognized DB string to
MemoryTaint::Internal (trusted); change the default arm so unknown values fail
closed by mapping them to MemoryTaint::ExternalSync (or another most-restrictive
taint) instead of Internal; update the match in from_db_str to return
Self::ExternalSync for the _ case, add a brief comment explaining the
fail-closed behavior and add a unit test that verifies an unknown string yields
ExternalSync to prevent silent downgrades.

---

Outside diff comments:
In `@src/openhuman/agent/bus.rs`:
- Around line 37-48: Remove the duplicated documentation block in
src/openhuman/agent/bus.rs by deleting the first occurrence of the repeated doc
comment (the earlier copy of the "Full owned payload for a single agentic turn
executed through the bus..." paragraph) and keep the single correct copy that
remains; ensure surrounding spacing and indentation for the struct/enum/comment
(the doc tied to the bus payload type) is preserved so rustdoc and compiler
comments are unchanged.

In `@src/openhuman/memory_store/memory_trait.rs`:
- Around line 315-342: The get() and list() paths currently hardcode
MemoryTaint::Internal when constructing MemoryEntry, which overwrites the stored
taint; update both SQL selects to include the taint column, read it from the row
tuple (e.g., add a String taint field to the row tuple alongside category), and
pass the stored value through the existing mapper (e.g.,
memory_taint_from_stored(&taint) or add such a small mapper) instead of
MemoryTaint::Internal so the MemoryEntry.taint preserves the DB provenance;
apply the same change in the code that builds MemoryEntry in both get() and
list() (use the same timestamp_to_rfc3339(...) and
memory_category_from_stored(...) logic).

In `@src/openhuman/skills/schemas.rs`:
- Around line 923-985: The code currently hard-codes AgentTurnOrigin::Cli inside
the spawned task; capture the current turn origin before tokio::spawn (e.g., let
parent_origin =
crate::openhuman::agent::turn_origin::current_origin().unwrap_or(AgentTurnOrigin::Cli);),
move/clone parent_origin into the async block, and use that variable in the
with_origin call instead of AgentTurnOrigin::Cli so child skill runs inherit the
parent provenance (preserving any ExternalChannel/tainted origin) while still
defaulting to Cli when no parent exists. Ensure parent_origin is cloned/moved
where needed and referenced in the call to
with_origin(crate::openhuman::agent::turn_origin::AgentTurnOrigin::..., ...) so
the spawned task preserves the original turn origin.

---

Nitpick comments:
In `@src/openhuman/memory_store/safety/mod.rs`:
- Around line 241-279: Add a regression test that asserts taint is preserved by
sanitize_document_input: create a NamespaceDocumentInput (e.g., representing an
ExternalSync document) with a non-default taint, call
sanitize_document_input(namespace_doc), and assert sanitized.value.taint ==
original_input.taint; ensure the test constructs realistic
title/content/tags/metadata values so sanitize_text/sanitize_json run, and place
the test near existing safety tests exercising sanitize_document_input to guard
the provenance rule in function sanitize_document_input and fields
NamespaceDocumentInput::taint.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 33cd49ef-d0c1-4ea1-bbcb-593fd32fa432

📥 Commits

Reviewing files that changed from the base of the PR and between b3c4eff and 9d386a0.

📒 Files selected for processing (72)
  • src/core/memory_cli.rs
  • src/openhuman/agent/bus.rs
  • src/openhuman/agent/harness/memory_context.rs
  • src/openhuman/agent/harness/memory_context_safety.rs
  • src/openhuman/agent/memory_loader.rs
  • src/openhuman/agent/mod.rs
  • src/openhuman/agent/task_dispatcher.rs
  • src/openhuman/agent/triage/evaluator.rs
  • src/openhuman/agent/turn_origin.rs
  • src/openhuman/agent_tool_policy/engine.rs
  • src/openhuman/approval/gate.rs
  • src/openhuman/autocomplete/history.rs
  • src/openhuman/channels/context.rs
  • src/openhuman/channels/providers/web.rs
  • src/openhuman/channels/runtime/dispatch.rs
  • src/openhuman/channels/runtime/test_support.rs
  • src/openhuman/config/schema/agent.rs
  • src/openhuman/cron/scheduler.rs
  • src/openhuman/inference/local/ops.rs
  • src/openhuman/learning/reflection_tests.rs
  • src/openhuman/learning/tool_tracker.rs
  • src/openhuman/learning/transcript_ingest/tests.rs
  • src/openhuman/learning/user_profile.rs
  • src/openhuman/mcp_server/tools.rs
  • src/openhuman/meet_agent/brain.rs
  • src/openhuman/memory/ingestion/parse.rs
  • src/openhuman/memory/ingestion/parse_tests.rs
  • src/openhuman/memory/ingestion/queue.rs
  • src/openhuman/memory/ingestion/tests.rs
  • src/openhuman/memory/mod.rs
  • src/openhuman/memory/ops/documents.rs
  • src/openhuman/memory/ops/helpers.rs
  • src/openhuman/memory/ops/learn.rs
  • src/openhuman/memory/ops_tests.rs
  • src/openhuman/memory/traits.rs
  • src/openhuman/memory_store/client.rs
  • src/openhuman/memory_store/client_tests.rs
  • src/openhuman/memory_store/memory_trait.rs
  • src/openhuman/memory_store/safety/mod.rs
  • src/openhuman/memory_store/types.rs
  • src/openhuman/memory_store/unified/documents.rs
  • src/openhuman/memory_store/unified/documents_tests.rs
  • src/openhuman/memory_store/unified/init.rs
  • src/openhuman/memory_store/unified/query.rs
  • src/openhuman/memory_store/unified/query_tests.rs
  • src/openhuman/memory_tools/test_helpers.rs
  • src/openhuman/screen_intelligence/helpers.rs
  • src/openhuman/security/policy.rs
  • src/openhuman/security/policy_command.rs
  • src/openhuman/security/policy_tests.rs
  • src/openhuman/skills/schemas.rs
  • src/openhuman/subconscious/engine.rs
  • src/openhuman/subconscious/engine_tests.rs
  • src/openhuman/subconscious/situation_report/mod.rs
  • src/openhuman/subconscious/situation_report/query_window.rs
  • src/openhuman/subconscious/situation_report/summaries.rs
  • src/openhuman/tools/impl/system/tool_stats.rs
  • tests/agent_harness_leftovers_raw_coverage_e2e.rs
  • tests/agent_memory_loader_public.rs
  • tests/agent_session_round24_raw_coverage_e2e.rs
  • tests/agent_session_turn_raw_coverage_e2e.rs
  • tests/agent_tool_loop_raw_coverage_e2e.rs
  • tests/agent_turn_builder_leftovers_raw_coverage_e2e.rs
  • tests/agent_turn_toolloop_round22_raw_coverage_e2e.rs
  • tests/inference_agent_raw_coverage_e2e.rs
  • tests/memory_graph_sync_e2e.rs
  • tests/memory_raw_coverage_e2e.rs
  • tests/memory_threads_raw_coverage_e2e.rs
  • tests/memory_tree_memory_round23_raw_coverage_e2e.rs
  • tests/personality_e2e.rs
  • tests/screen_intelligence_vision_e2e.rs
  • tests/subconscious_e2e.rs

Comment thread src/openhuman/config/schema/agent.rs Outdated
Comment thread src/openhuman/meet_agent/brain.rs Outdated
Comment thread src/openhuman/memory_store/unified/documents.rs
Comment thread src/openhuman/memory/traits.rs Outdated
oxoxDev added 2 commits June 2, 2026 22:45
from_db_str() previously mapped any unrecognised persisted taint string
to MemoryTaint::Internal — the trusted variant. A forward-rolled schema
that ships a new taint variant, a manual UPDATE typo, or row corruption
would therefore silently downgrade a chunk of unknown provenance to
user-authored content, defeating the subconscious gate's
external_effect refusal.

Map both '' and any future/unknown raw string to MemoryTaint::ExternalSync
instead, and add an explicit 'internal' arm so legacy rows still
round-trip cleanly through the migration's NOT NULL DEFAULT 'internal'
clause. The unit test is updated to assert the fail-closed contract.
Clarify the comment above the from_db_str() call in load_documents_for_scope:
legacy rows still round-trip via the migration's NOT NULL DEFAULT 'internal',
but unknown / corrupted values now fail closed to ExternalSync inside
from_db_str() — the previous wording incorrectly stated they also folded
to Internal.
oxoxDev added 2 commits June 2, 2026 22:47
llm_meeting_agentic() previously scoped its agent run with
AgentTurnOrigin::Cli, which gives trusted-CLI semantics. After
run_grant_turn the meet-agent also dispatches utterances from
non-owner participants, so the prompt text is externally-sourced
speech rather than a local internal invocation — Cli would route
external_effect tool calls past the approval surface unprompted.

Use AgentTurnOrigin::ExternalChannel with channel='meet',
reply_target=request_id, message_id='meet-<request_id>-<now_ms>'.
The gate now treats meet-agent turns the same way it treats inbound
Telegram / Discord / Slack messages: external_effect tools persist
a pending_approvals row for the audit trail and the parked future
TTL-denies (the fail-closed default for remote inputs the live
caption surface can't yet relay an interactive approval through).
The rustdoc previously said an empty channel_permissions map 'fails
closed to readonly', but the engine in
agent_tool_policy::engine::permission_for_channel actually returns
PermissionLevel::Dangerous when the map is empty (legacy-preservation
branch — explicit log line: 'channel permissions empty; preserving
legacy unrestricted surface'). The ReadOnly fall-back only fires for
the non-empty-map / channel-absent path.

Rewrite the docstring as a three-bullet runtime table so the per-branch
behavior is unambiguous, and call out that the empty-map branch is
effectively reachable only by manual on-disk edits because
migrate_channel_permissions_if_legacy seeds the map on first boot.
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 2, 2026
…worker stack

`with_origin` is composed with `APPROVAL_CHAT_CONTEXT.scope(...)` and the
agent loop downstream (tool dispatch, recursive sub-agent invocations,
LLM streaming). Stacking two task-local scopes plus the agent loop on
the 2 MiB worker stack the test runtime uses overflows mid-turn —
reproduced as a `fatal runtime error: stack overflow, aborting` in
`json_rpc_protocol_auth_and_agent_hello` on the upstream CI lanes
`Rust E2E (mock backend)` and `Rust Core Coverage (cargo-llvm-cov)`.

Box-pin the inner future before handing it to the task-local scope so
the combined future state machine stays heap-allocated. Same pattern as
PR tinyhumansai#3151 for the sub-agent engine. Single-point remediation that covers
every caller (web channel, channel runtime, subconscious, cron, CLI).

`json_rpc_protocol_auth_and_agent_hello` now passes in 0.49s; the full
`json_rpc_e2e` suite (76 tests) passes locally.
@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label Jun 2, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 2, 2026
oxoxDev added 2 commits June 2, 2026 23:56
approval_gate_rpc_decision_resumes_parked_tool_and_records_execution
previously scoped only APPROVAL_CHAT_CONTEXT around its
intercept_audited call. With the explicit-origin gate this now hits
the Unknown branch and is denied immediately, so no
pending_approvals row is ever persisted and the polling loop times
out with 'approval request did not appear before timeout'.

Wrap the approval task in with_origin(AgentTurnOrigin::WebChat{...})
alongside the existing chat context. Mirrors the production web
channel path in channels/providers/web.rs.
… gate

approval_rpc_decision_paths_persist_always_allow_and_recent_audit
asserted the legacy 'no-chat-context ⇒ Allow' fall-through, but the
gate now refuses unlabelled call sites (the entire reason for the
origin propagation work). Three changes here, scoped to the test
file only:

* Wrap the three parking-flow scopes (approval_task, deny_task,
  persist_failure) in with_origin(AgentTurnOrigin::WebChat{...})
  alongside the existing APPROVAL_CHAT_CONTEXT. The chat-context
  alone is no longer enough to reach the parking flow.
* Flip the bare-call assertions (no_chat + gate.intercept(...))
  from GateOutcome::Allow to GateOutcome::Deny, and assert the
  reason mentions 'no origin label'. This is the documented
  fail-closed contract the PR is supposed to enforce.
* Install a live_policy with the persisted auto_approve entry
  before exercising the auto_approve allowlist short-circuit —
  the gate's boot-time config snapshot pre-dates the
  approve_always_for_tool decision the test issues, and the test
  was previously only getting Allow because the no-chat-context
  fall-through bypassed the lookup entirely.

Also acquire env_lock() at the top of
tool_registry_schema_handlers_validate_and_return_payloads — that
test loads Config::load_or_init() in the diagnostics handler, and a
sibling test (tool_registry_diagnostics_reports_config_and_audit_store_failures)
points OPENHUMAN_WORKSPACE at a regular file to exercise the
load-failure branch. Without the lock the env mutation races into
this test and trips 'Failed to create config directory'. Pre-existing
isolation gap on upstream/main; landing in CI now because the
coverage matrix is busier.
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 2, 2026
@senamakel senamakel self-assigned this Jun 3, 2026
senamakel added 5 commits June 2, 2026 20:20
# Conflicts:
#	src/openhuman/learning/transcript_ingest/tests.rs
Capture the current AgentTurnOrigin before tokio::spawn so skill
runs triggered from ExternalChannel/tainted contexts retain their
provenance through the approval gate instead of being downgraded
to Cli.
get() now SELECTs the taint column and maps it via from_db_str
instead of hardcoding Internal. list_documents() includes taint
in its JSON output so the list() Memory trait impl can surface
the real persisted provenance.
Asserts that sanitize_document_input passes ExternalSync taint
through unchanged — guards the provenance contract.
@senamakel senamakel merged commit 3a7180d into tinyhumansai:main Jun 3, 2026
31 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. feature Net-new user-facing capability or product behavior. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants