fix(memory): group selector ingests by source scope by senamakel · Pull Request #3264 · tinyhumansai/openhuman

senamakel · 2026-06-03T02:36:17Z

Summary

Preserve item-level memory document identity for selector ingests while grouping graph/source archive identity by stable source scope.
Persist metadata.path_scope in mem_tree_chunks so extraction workers and reloaded chunks keep source grouping intact.
Update Notion, GitHub, Linear, and ClickUp Composio ingests to use connection-level source scope instead of per-page/per-issue/per-task graph sources.
Seed Obsidian source/... tags from path_scope when present, preventing one raw source folder/tag per selected item.
Document the memory source identity rule in AGENTS.md and CLAUDE.md.

Problem

Composio selector providers were mixing document dedupe identity with source/archive identity. Notion used notion:{connection_id}:{page_id} as the document source id, which caused every selected page to create a separate raw source folder and graph source. The same per-item pattern existed in GitHub issues/PRs, Linear issues, and ClickUp tasks.

Solution

Keep per-item metadata.source_id values for dedupe and item-level ingest bookkeeping.
Set metadata.path_scope with ingest_document_with_scope to a stable collection identity, currently <provider>:<connection_id> for the affected Composio selectors.
Add a path_scope column and migration for mem_tree_chunks, and hydrate it back into Metadata on reads.
Build Obsidian source tags from path_scope when present, falling back to source_id for legacy chunks.
Add focused tests proving the connection-level scope is persisted and used for Notion/GitHub/Linear/ClickUp ingest output.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/pr-ci.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge. Local full coverage not run; CI enforces changed-line coverage. Initial CI exposed unrelated full-suite Rust flakes; follow-up commit 5448794 stabilizes those tests. CodeRabbit follow-up commit 1b25aaf preserves scoped source tags during extraction tag rewrites. CI follow-up commit 97abc5f stabilizes the Rust core coverage failures exposed by the full-suite order. Follow-up commit 287422e makes the triage failed-event assertions tolerate approval-denied and missing-agent failure paths. Follow-up commit 95b2d56 redacts source identifiers in the scoped tag debug log.
Coverage matrix updated — N/A: internal memory ingest/source identity fix; no feature row added/removed/renamed.
All affected feature IDs from the matrix are listed in the PR description under ## Related — N/A: no coverage matrix feature IDs affected.
No new external network dependencies introduced (mock backend used per Testing Strategy)
Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A: no release manual smoke surface changed.
Linked issue closed via Closes #NNN in the ## Related section — N/A: no GitHub issue provided.

Impact

This affects Rust memory ingest and memory chunk storage. Existing chunks continue to read with path_scope = None and fall back to the old source_id behavior. New selector ingests group user-visible raw source folders and Obsidian source tags under the stable selector scope. Existing bad raw folders are not migrated by this PR.

AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

Key: N/A
URL: N/A

Commit & Branch

Branch: fix/memory-raw-source-identity
Commit SHA: 95b2d564

Validation Run

pnpm --filter openhuman-app format:check via pnpm format:check
pnpm typecheck via pnpm compile after pnpm install --frozen-lockfile refreshed local node_modules
Focused tests:
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib memory_sync::composio::providers::notion::ingest::tests::ingest_page_writes_to_memory_tree
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib memory_sync::composio::providers::github::ingest::tests::ingest_issue_writes_to_memory_tree
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib memory_sync::composio::providers::linear::ingest::tests::ingest_issue_writes_to_memory_tree
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib memory_sync::composio::providers::clickup::ingest::tests::ingest_task_writes_to_memory_tree
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib upsert_persists_path_scope
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib schema_has_content_path_and_content_sha256_columns
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::memory_store::content::compose::tests::compose_persists_path_scope_and_seeds_scoped_source_tag -- --nocapture
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::memory_store::content::tags::tests::update_chunk_tags_prefers_path_scope_for_source_tag -- --nocapture
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib path_scope -- --nocapture
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::agent::triage::escalation::tests::apply_decision_react_failure_publishes_failed_event -- --nocapture
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::agent::triage::escalation::tests::apply_decision_escalate_failure_publishes_failed_event -- --nocapture
- GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::memory_tree::tree::rpc::tests::pipeline_status_reports_chunk_aggregates_after_ingest -- --nocapture
Rust fmt/check (if changed): cargo fmt --manifest-path Cargo.toml; GGML_NATIVE=OFF cargo check --manifest-path Cargo.toml
Tauri fmt/check (if changed): pre-push hook ran pnpm rust:check / cargo check --manifest-path app/src-tauri/Cargo.toml

Validation Blocked

command: GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib
error: repo-wide run completed with 10630 passed; 329 failed; 23 ignored
impact: broad suite is not a clean signal for this patch; focused memory ingest/store tests listed above passed.

Behavior Changes

Intended behavior change: selector ingests now create one logical source scope per connection/selector instead of one source scope per item.
User-visible effect: new Notion/GitHub/Linear/ClickUp selector ingests should stop polluting the memory graph/raw archive with per-item source folders and tags.

Parity Contract

Legacy behavior preserved: item-level metadata.source_id remains for dedupe and source-ingested checks.
Guard/fallback/dispatch parity checks: chunks without path_scope fall back to source_id when composing source tags.

Duplicate / Superseded PR Handling

Duplicate PR(s): N/A
Canonical PR: this PR
Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

New Features
- Store and use a connection-level "path_scope" for ingested content across GitHub, Linear, Notion, and ClickUp so chunks are grouped by connection-level source.
Bug Fixes
- Improved test reliability for escalation and proactive delivery paths; adjusted status tests to accept degraded/running outcomes.
Documentation
- Clarified contributor guidelines: use collection-level stable source identity; per-item IDs are for dedupe only.

coderabbitai · 2026-06-03T02:36:26Z

📝 Walkthrough

Walkthrough

Adds a connection/collection-level source scope (metadata.path_scope), persists it in the chunk store, seeds front-matter/tags from it when present, updates provider ingests to call ingest_document_with_scope with connection-level scopes, and adjusts docs/tests and test-only guards.

Changes

Connection-level source scope for memory organization

Layer / File(s)	Summary
Documentation and test infrastructure `AGENTS.md`, `CLAUDE.md`, `src/openhuman/channels/proactive.rs`, `src/openhuman/agent/triage/escalation.rs`, `src/openhuman/memory_tree/tree/rpc.rs`	Adds contributor guidance requiring collection-level source identity in `metadata.path_scope`; gate approval logic under tests; make escalation tests generate missing-agent IDs and relax event reason checks; accept `running` or `degraded` pipeline status in tests.
Database schema and chunk storage layer `src/openhuman/memory_store/chunks/connection.rs`, `src/openhuman/memory_store/chunks/store.rs`, `src/openhuman/memory_store/chunks/store_tests.rs`	Add nullable `path_scope TEXT` to `mem_tree_chunks` via idempotent migration; include `path_scope` in all INSERT/ON CONFLICT upserts and SELECTs; decode into `Chunk.metadata.path_scope`; tests verify migration and round-trip persistence.
Content composition and tagging `src/openhuman/memory_store/content/compose.rs`, `src/openhuman/memory_store/content/tags.rs`	Front-matter optionally emits `path_scope:` and tag helpers seed `source/<slug>` from `path_scope` when present (falling back to `source_id`); tag-update logic prefers `path_scope`. Unit tests added/updated.
GitHub provider ingestion and scoping `src/openhuman/memory_sync/composio/providers/github/ingest.rs`	Adds `github_source_scope(connection_id)` producing `github:{connection_id}`; switches `ingest_issue_into_memory_tree` to `ingest_document_with_scope` with `path_scope`; tests assert scope stability and persisted `path_scope`.
Linear provider ingestion and scoping `src/openhuman/memory_sync/composio/providers/linear/ingest.rs`	Adds `linear_source_scope(connection_id)` producing `linear:{connection_id}`; switches ingest to `ingest_document_with_scope` with `path_scope`; tests assert scope stability and chunk metadata.
Notion provider ingestion and scoping `src/openhuman/memory_sync/composio/providers/notion/ingest.rs`	Adds `notion_source_scope(connection_id)` and switches ingest to `ingest_document_with_scope` with connection-level `path_scope`; updates regression tests to verify connection-scoped file paths, metadata, and tag composition while preserving per-page `source_id` registration.
ClickUp provider ingestion and scoping `src/openhuman/memory_sync/composio/providers/clickup/ingest.rs`	Adds `clickup_source_scope(connection_id)` and switches ingest to `ingest_document_with_scope` with connection-level `path_scope`; tests verify scope persistence and stability.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

tinyhumansai/openhuman#2887: Prior Notion ingest changes that handled per-page source IDs; this PR evolves that module to use connection-level path_scope.
tinyhumansai/openhuman#2889: Earlier GitHub ingest work; related because this PR introduces github_source_scope and connection-level scoping.
tinyhumansai/openhuman#3018: Prior Linear ingest changes; this PR adds connection-level linear_source_scope and updates ingestion accordingly.

Suggested reviewers

sanil-23
oxoxDev
graycyrus

Poem

🐰 Hopping through code with a happy scope,
Path_scope for order, source_id for hope.
Tags align neatly, chunks find their place,
One logical source, tidy memory space.
🍃

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(memory): group selector ingests by source scope' accurately describes the main change: introducing connection-level source scope grouping for selector-based ingests instead of per-item identities.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/openhuman/memory_store/content/compose.rs (1)

632-873: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add test coverage for path_scope-based tag seeding.

All existing test chunks have path_scope: None (lines 645, 726, 787, 860), so the new behavior introduced at lines 139-140 (using path_scope for the source/<slug> tag when present) is untested. Per coding guidelines and learnings, new behavior must ship with test coverage.

Add a test case that:

Creates a chunk with both source_id and path_scope set (e.g., source_id: "github:issue:456", path_scope: Some("github:conn123"))
Composes the chunk
Asserts the tags: block contains source/github-conn123 (from path_scope) and not source/github-issue-456 (from source_id)
Asserts the source_id: field in front-matter still writes the original source_id value (for deduplication contract)

🧪 Suggested test case

#[test]
fn compose_chunk_with_path_scope_seeds_tag_from_scope() {
    let ts = chrono::Utc.timestamp_millis_opt(1_700_000_000_000).unwrap();
    let chunk = Chunk {
        id: "test-scope-chunk".into(),
        content: "test body".into(),
        metadata: Metadata {
            source_kind: SourceKind::Chat,
            source_id: "github:issue:456".into(), // per-item ID
            owner: "alice@example.com".into(),
            timestamp: ts,
            time_range: (ts, ts),
            tags: vec![],
            source_ref: None,
            path_scope: Some("github:conn123".into()), // connection-level scope
        },
        token_count: 2,
        seq_in_source: 0,
        created_at: ts,
        partial_message: false,
    };
    let (full, _body) = compose_chunk_file(&chunk);
    let full_str = std::str::from_utf8(&full).unwrap();
    
    // Tag must use path_scope, not source_id
    assert!(
        full_str.contains("  - source/github-conn123"),
        "tags must seed from path_scope when present; got:\n{full_str}"
    );
    assert!(
        !full_str.contains("  - source/github-issue-456"),
        "tags must NOT seed from source_id when path_scope is present; got:\n{full_str}"
    );
    
    // Front-matter source_id field must still write the original source_id (dedup key)
    assert!(
        full_str.contains("source_id: \"github:issue:456\""),
        "source_id field must preserve original value; got:\n{full_str}"
    );
}

Based on learnings: Ship enough unit tests for the behavior you are adding or changing before building additional features on top of it.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_store/content/compose.rs` around lines 632 - 873, New
behavior that seeds the source/<slug> tag from Metadata.path_scope when present
is untested; add a unit test that constructs a Chunk with both
metadata.source_id (e.g., "github:issue:456") and metadata.path_scope =
Some("github:conn123"), calls compose_chunk_file(&chunk), and asserts the
front-matter tags block contains "  - source/github-conn123" and does NOT
contain "  - source/github-issue-456", and also asserts the front-matter still
contains the original source_id value ("source_id: \"github:issue:456\""); place
the test alongside the other compose tests (use the same chrono timestamp
pattern) and name it something like
compose_chunk_with_path_scope_seeds_tag_from_scope so it clearly targets
compose_chunk_file behavior.

🧹 Nitpick comments (4)

src/openhuman/memory_sync/composio/providers/github/ingest.rs (1)

195-200: ⚡ Quick win

Log the derived connection scope around the ingest call.

This is the new behavior in the file, but the trace path still omits source_id / path_scope, so a scope-grouping regression will be hard to diagnose from logs alone. Please add a trace/debug entry before the ingest_document_with_scope call and include the scope fields in the success/error path.

Suggested logging shape

     let tags: Vec<String> = DEFAULT_TAGS.iter().map(|s| s.to_string()).collect();
     let owner = github_source_scope(connection_id);
     let path_scope = Some(owner.clone());

+    tracing::trace!(
+        connection_id = %connection_id,
+        issue_id = %issue_id,
+        source_id = %source_id,
+        path_scope = %owner,
+        "[composio:github] ingest: calling ingest_document_with_scope"
+    );
     match ingest_pipeline::ingest_document_with_scope(
         config, &source_id, &owner, tags, doc, path_scope,
     )
     .await
     {
@@
             tracing::debug!(
                 connection_id = %connection_id,
                 issue_id = %issue_id,
+                source_id = %source_id,
+                path_scope = %owner,
                 chunks_written,
                 already_ingested,
                 "[composio:github] ingest: issue persisted"
             );

As per coding guidelines, "Add substantial, development-oriented debug or trace logs on new/changed flows using log / tracing crate with stable, grep-friendly prefixes" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_sync/composio/providers/github/ingest.rs` around lines
195 - 200, Add a debug/trace log right before calling
ingest_pipeline::ingest_document_with_scope that logs the derived scope fields
(owner from github_source_scope(connection_id)), source_id and path_scope so
they appear in tracing output; then update the success and error branches around
that call to include the same stable, grep-friendly log prefix and those fields
(source_id, path_scope, owner/connection_id) so both success and failure paths
record the scope for diagnostics.

src/openhuman/memory_sync/composio/providers/notion/ingest.rs (1)

170-175: ⚡ Quick win

Include the connection scope in Notion ingest diagnostics.

This PR changes how pages are grouped on disk and in tags, but the changed call path still does not log the derived path_scope. Please surface it in the call-boundary logs so path/tag mis-grouping can be debugged without inspecting SQLite or content files.

Suggested logging shape

     let tags: Vec<String> = DEFAULT_TAGS.iter().map(|s| s.to_string()).collect();
     let owner = notion_source_scope(connection_id);
     let path_scope = Some(owner.clone());

+    tracing::trace!(
+        connection_id = %connection_id,
+        page_id = %page_id,
+        source_id = %source_id,
+        path_scope = %owner,
+        "[composio:notion] ingest: calling ingest_document_with_scope"
+    );
     match ingest_pipeline::ingest_document_with_scope(
         config, &source_id, &owner, tags, doc, path_scope,
     )
     .await
     {
@@
             tracing::debug!(
                 connection_id = %connection_id,
                 page_id = %page_id,
+                source_id = %source_id,
+                path_scope = %owner,
                 chunks_written,
                 already_ingested,
                 "[composio:notion] ingest: page persisted"
             );

As per coding guidelines, "Add substantial, development-oriented debug or trace logs on new/changed flows using log / tracing crate with stable, grep-friendly prefixes" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_sync/composio/providers/notion/ingest.rs` around lines
170 - 175, Before calling ingest_pipeline::ingest_document_with_scope, emit a
debug/trace log that includes the derived connection scope (owner from
notion_source_scope), the computed path_scope, and the source_id so consumers
can grep for the boundary and troubleshoot path/tag mis-grouping; specifically,
log a clearly prefixed message (e.g., "notion:ingest:scope") that prints
owner.clone(), path_scope (Some(owner.clone())), and &source_id immediately
before the ingest_document_with_scope invocation in ingest.rs.

src/openhuman/memory_sync/composio/providers/clickup/ingest.rs (1)

107-112: ⚡ Quick win

Log the new ClickUp scope at the ingest boundary.

The connection-scoped grouping is the changed behavior here, but the trace/debug output still hides the derived path_scope. Adding it to the external-call and success/error logs would make grouping regressions much easier to isolate.

Suggested logging shape

     let tags: Vec<String> = DEFAULT_TAGS.iter().map(|s| s.to_string()).collect();
     let owner = clickup_source_scope(connection_id);
     let path_scope = Some(owner.clone());

+    tracing::trace!(
+        connection_id = %connection_id,
+        task_id = %task_id,
+        source_id = %source_id,
+        path_scope = %owner,
+        "[composio:clickup] ingest: calling ingest_document_with_scope"
+    );
     match ingest_pipeline::ingest_document_with_scope(
         config, &source_id, &owner, tags, doc, path_scope,
     )
     .await
     {
@@
             tracing::debug!(
                 connection_id = %connection_id,
                 task_id = %task_id,
+                source_id = %source_id,
+                path_scope = %owner,
                 chunks_written,
                 already_ingested,
                 "[composio:clickup] ingest: task persisted"
             );

As per coding guidelines, "Add substantial, development-oriented debug or trace logs on new/changed flows using log / tracing crate with stable, grep-friendly prefixes" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_sync/composio/providers/clickup/ingest.rs` around lines
107 - 112, Log the derived ClickUp scope (path_scope) at the ingest boundary and
include it in the external-call and success/error logs: call out the derived
scope returned by clickup_source_scope(connection_id) (owner/path_scope) before
calling ingest_pipeline::ingest_document_with_scope, and add trace/debug logging
with a stable grep-friendly prefix that records the config, source_id, owner,
and path_scope being passed; also ensure the result handling (success and error
branches around ingest_pipeline::ingest_document_with_scope) logs the same
path_scope alongside success/error details to aid grouping/regression debugging.

src/openhuman/memory_sync/composio/providers/linear/ingest.rs (1)

100-105: ⚡ Quick win

Expose path_scope in the Linear ingest logs.

The new connection-scoped behavior is only visible in persisted data right now. Adding source_id / path_scope to the pre-call and success/error logs will make scope-routing regressions much easier to trace.

Suggested logging shape

     let tags: Vec<String> = DEFAULT_TAGS.iter().map(|s| s.to_string()).collect();
     let owner = linear_source_scope(connection_id);
     let path_scope = Some(owner.clone());

+    tracing::trace!(
+        connection_id = %connection_id,
+        issue_id = %issue_id,
+        source_id = %source_id,
+        path_scope = %owner,
+        "[composio:linear] ingest: calling ingest_document_with_scope"
+    );
     match ingest_pipeline::ingest_document_with_scope(
         config, &source_id, &owner, tags, doc, path_scope,
     )
     .await
     {
@@
             tracing::debug!(
                 connection_id = %connection_id,
                 issue_id = %issue_id,
+                source_id = %source_id,
+                path_scope = %owner,
                 chunks_written,
                 already_ingested,
                 "[composio:linear] ingest: issue persisted"
             );

As per coding guidelines, "Add substantial, development-oriented debug or trace logs on new/changed flows using log / tracing crate with stable, grep-friendly prefixes" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_sync/composio/providers/linear/ingest.rs` around lines
100 - 105, Add debug/trace logs around the call to
ingest_pipeline::ingest_document_with_scope in ingest.rs so the new connection
scope is visible: log a pre-call message that includes source_id and path_scope
(owner.clone()) and the call context, and log success and error outcomes with
the same stable, grep-friendly prefix showing source_id and path_scope; use the
tracing/log crate at debug or trace level and include the variables owner (from
linear_source_scope), source_id, and path_scope in the messages to make
scope-routing regressions easy to trace.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_store/content/compose.rs`:
- Around line 139-140: Add a debug/tracing log around the new tag-seeding branch
so it records which scope is used: import tracing::debug in this module, then
before calling with_source_tag use
meta.path_scope.as_deref().unwrap_or(&meta.source_id) (source_scope) and emit a
debug log with a stable prefix like "[memory-compose] seeding source tag" plus
fields scope=source_scope, source_id=meta.source_id and
using_path_scope=meta.path_scope.is_some(); then call
with_source_tag(source_scope, &meta.tags). This ensures source_scope,
meta.path_scope and with_source_tag are the correlated symbols referenced in the
log.
- Around line 139-140: The front-matter tag rewrite currently loses
connection-level scoping because compose.rs seeds source_scope from
meta.path_scope but does not persist path_scope into the front-matter; update
the code that serializes front-matter to include path_scope when
meta.path_scope.is_some(), and change
src/openhuman/memory_store/content/tags.rs::augment_with_source_tag_for_chunk to
read scan_fm_field(fm, "path_scope") first and use that value for the
source/<...> tag, falling back to scan_fm_field(fm, "source_id") if path_scope
is absent; also add a unit test that sets path_scope = Some("...") and verifies
the rewrite path preserves the connection-level source/<...> tag.

---

Outside diff comments:
In `@src/openhuman/memory_store/content/compose.rs`:
- Around line 632-873: New behavior that seeds the source/<slug> tag from
Metadata.path_scope when present is untested; add a unit test that constructs a
Chunk with both metadata.source_id (e.g., "github:issue:456") and
metadata.path_scope = Some("github:conn123"), calls compose_chunk_file(&chunk),
and asserts the front-matter tags block contains "  - source/github-conn123" and
does NOT contain "  - source/github-issue-456", and also asserts the
front-matter still contains the original source_id value ("source_id:
\"github:issue:456\""); place the test alongside the other compose tests (use
the same chrono timestamp pattern) and name it something like
compose_chunk_with_path_scope_seeds_tag_from_scope so it clearly targets
compose_chunk_file behavior.

---

Nitpick comments:
In `@src/openhuman/memory_sync/composio/providers/clickup/ingest.rs`:
- Around line 107-112: Log the derived ClickUp scope (path_scope) at the ingest
boundary and include it in the external-call and success/error logs: call out
the derived scope returned by clickup_source_scope(connection_id)
(owner/path_scope) before calling ingest_pipeline::ingest_document_with_scope,
and add trace/debug logging with a stable grep-friendly prefix that records the
config, source_id, owner, and path_scope being passed; also ensure the result
handling (success and error branches around
ingest_pipeline::ingest_document_with_scope) logs the same path_scope alongside
success/error details to aid grouping/regression debugging.

In `@src/openhuman/memory_sync/composio/providers/github/ingest.rs`:
- Around line 195-200: Add a debug/trace log right before calling
ingest_pipeline::ingest_document_with_scope that logs the derived scope fields
(owner from github_source_scope(connection_id)), source_id and path_scope so
they appear in tracing output; then update the success and error branches around
that call to include the same stable, grep-friendly log prefix and those fields
(source_id, path_scope, owner/connection_id) so both success and failure paths
record the scope for diagnostics.

In `@src/openhuman/memory_sync/composio/providers/linear/ingest.rs`:
- Around line 100-105: Add debug/trace logs around the call to
ingest_pipeline::ingest_document_with_scope in ingest.rs so the new connection
scope is visible: log a pre-call message that includes source_id and path_scope
(owner.clone()) and the call context, and log success and error outcomes with
the same stable, grep-friendly prefix showing source_id and path_scope; use the
tracing/log crate at debug or trace level and include the variables owner (from
linear_source_scope), source_id, and path_scope in the messages to make
scope-routing regressions easy to trace.

In `@src/openhuman/memory_sync/composio/providers/notion/ingest.rs`:
- Around line 170-175: Before calling
ingest_pipeline::ingest_document_with_scope, emit a debug/trace log that
includes the derived connection scope (owner from notion_source_scope), the
computed path_scope, and the source_id so consumers can grep for the boundary
and troubleshoot path/tag mis-grouping; specifically, log a clearly prefixed
message (e.g., "notion:ingest:scope") that prints owner.clone(), path_scope
(Some(owner.clone())), and &source_id immediately before the
ingest_document_with_scope invocation in ingest.rs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fdd957ba-7413-4d97-9abf-8598d3edde29

📥 Commits

Reviewing files that changed from the base of the PR and between 24c7d31 and 5448794.

📒 Files selected for processing (12)

AGENTS.md
CLAUDE.md
src/openhuman/agent/triage/escalation.rs
src/openhuman/channels/proactive.rs
src/openhuman/memory_store/chunks/connection.rs
src/openhuman/memory_store/chunks/store.rs
src/openhuman/memory_store/chunks/store_tests.rs
src/openhuman/memory_store/content/compose.rs
src/openhuman/memory_sync/composio/providers/clickup/ingest.rs
src/openhuman/memory_sync/composio/providers/github/ingest.rs
src/openhuman/memory_sync/composio/providers/linear/ingest.rs
src/openhuman/memory_sync/composio/providers/notion/ingest.rs

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/openhuman/agent/triage/escalation.rs (1)
244-249: ⚖️ Poor tradeoff

Consider refactoring the test guard to use registry mocking instead of conditional compilation.

The #[cfg(test)] guard inside dispatch_target_agent mixes test concerns into production code. While safe (the prefix is unlikely to collide with real agent IDs), this pattern makes the function's contract less clear. A cleaner approach would be to mock or stub the AgentDefinitionRegistry in tests to return None for the missing agent, allowing the production error path at line 268 to handle it naturally.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/triage/escalation.rs` around lines 244 - 249, The
#[cfg(test)] conditional inside dispatch_target_agent introduces test-only
behavior; remove this test guard and the special-case return so
dispatch_target_agent always follows the normal production path (letting the
existing Err path for missing agent definitions run). Update tests that relied
on the missing-agent- prefix to instead mock or stub AgentDefinitionRegistry (or
the method that looks up definitions) so it returns None for the test agent id,
or inject a test registry implementation into dispatch_target_agent's
caller/context; ensure tests verify the same error is produced by the standard
not-found branch rather than using the cfg(test) shortcut.
src/openhuman/memory_tree/tree/rpc.rs (1)
1135-1150: ⚖️ Poor tradeoff

Environment-dependent test reduces determinism.

The test now accepts either "running" or "degraded" status based on provider availability in the test environment. While both outcomes are validated (terminal, non-syncing, reason check for degraded), this makes the test environment-dependent and could mask regressions if the environment changes.

Consider one of these alternatives:

Mock or stub the provider/embedder availability to force deterministic behavior.

Split into two tests: one that validates the "running" path with a configured provider, and one that validates the "degraded" path without a provider.

Use a test configuration flag to explicitly control which outcome is expected.

Based on learnings: tests should be deterministic and avoid hidden global state.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/tree/rpc.rs` around lines 1135 - 1150, The current
test's match on out.status.as_str() (checking "running" vs "degraded" and
inspecting out.reason) is environment-dependent; make the test deterministic by
controlling provider/embedder availability rather than accepting either outcome.
Fix by changing the test harness that calls the code which produces out (the
code that sets out.status/out.reason) to explicitly configure provider
availability (e.g., set a test config, environment variable, or call the
existing test helper to enable/disable the embedder) and then assert the single
expected status ("running" when provider enabled, or "degraded" and the
semantic-recall reason when disabled). Alternatively split the single test into
two tests: one that forces provider present and asserts out.status == "running"
and out.reason.is_none(), and one that forces provider absent and asserts
out.status == "degraded" with the semantic recall message; update any
setup/teardown to stub/moc k the provider accordingly so out.status/out.reason
are deterministic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_store/content/compose.rs`:
- Around line 143-148: The debug log in content_store::compose that prints
meta.source_id and source_scope may leak PII; update the log::debug call in
src/openhuman/memory_store/content/compose.rs to wrap meta.source_id and
source_scope with redact() (e.g., redact(&meta.source_id),
redact(&source_scope)) and leave meta.path_scope.is_some() as-is so the boolean
is preserved; ensure you import or reference the existing redact function used
across the codebase and update the log invocation to pass the redacted values
instead of raw ones.

---

Nitpick comments:
In `@src/openhuman/agent/triage/escalation.rs`:
- Around line 244-249: The #[cfg(test)] conditional inside dispatch_target_agent
introduces test-only behavior; remove this test guard and the special-case
return so dispatch_target_agent always follows the normal production path
(letting the existing Err path for missing agent definitions run). Update tests
that relied on the missing-agent- prefix to instead mock or stub
AgentDefinitionRegistry (or the method that looks up definitions) so it returns
None for the test agent id, or inject a test registry implementation into
dispatch_target_agent's caller/context; ensure tests verify the same error is
produced by the standard not-found branch rather than using the cfg(test)
shortcut.

In `@src/openhuman/memory_tree/tree/rpc.rs`:
- Around line 1135-1150: The current test's match on out.status.as_str()
(checking "running" vs "degraded" and inspecting out.reason) is
environment-dependent; make the test deterministic by controlling
provider/embedder availability rather than accepting either outcome. Fix by
changing the test harness that calls the code which produces out (the code that
sets out.status/out.reason) to explicitly configure provider availability (e.g.,
set a test config, environment variable, or call the existing test helper to
enable/disable the embedder) and then assert the single expected status
("running" when provider enabled, or "degraded" and the semantic-recall reason
when disabled). Alternatively split the single test into two tests: one that
forces provider present and asserts out.status == "running" and
out.reason.is_none(), and one that forces provider absent and asserts out.status
== "degraded" with the semantic recall message; update any setup/teardown to
stub/moc k the provider accordingly so out.status/out.reason are deterministic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f9c95d95-62d0-4cda-8870-78a497a887e1

📥 Commits

Reviewing files that changed from the base of the PR and between 5448794 and 287422e.

📒 Files selected for processing (4)

src/openhuman/agent/triage/escalation.rs
src/openhuman/memory_store/content/compose.rs
src/openhuman/memory_store/content/tags.rs
src/openhuman/memory_tree/tree/rpc.rs

fix(memory): group selector ingests by source scope

412055c

senamakel requested a review from a team June 3, 2026 02:36

test(core): stabilize full-suite coverage flakes

5448794

coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. bug labels Jun 3, 2026

coderabbitai Bot requested changes Jun 3, 2026

View reviewed changes

Comment thread src/openhuman/memory_store/content/compose.rs

fix(memory): preserve scoped source tags during rewrites

1b25aaf

coderabbitai Bot previously approved these changes Jun 3, 2026

View reviewed changes

test(core): stabilize coverage-only rust failures

97abc5f

senamakel dismissed coderabbitai[bot]’s stale review via 97abc5f June 3, 2026 03:58

test(core): tolerate approval-denied triage failures

287422e

coderabbitai Bot requested changes Jun 3, 2026

View reviewed changes

Comment thread src/openhuman/memory_store/content/compose.rs

senamakel merged commit 5f60b54 into tinyhumansai:main Jun 3, 2026
22 of 25 checks passed

coderabbitai Bot approved these changes Jun 3, 2026

View reviewed changes

senamakel mentioned this pull request Jun 3, 2026

fix(memory): redact source tag debug identifiers #3272

Merged

12 tasks

This was referenced Jun 3, 2026

test(memory_tree): de-flake pipeline_status_reports_chunk_aggregates_after_ingest (#3256) #3273

Closed

Flaky test: memory_tree pipeline_status_reports_chunk_aggregates_after_ingest #3256

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory): group selector ingests by source scope#3264

fix(memory): group selector ingests by source scope#3264
senamakel merged 5 commits into
tinyhumansai:mainfrom
senamakel:fix/memory-raw-source-identity

senamakel commented Jun 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

senamakel commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

senamakel commented Jun 3, 2026 •

edited

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading