fix(memory): group selector ingests by source scope#3264
Conversation
📝 WalkthroughWalkthroughAdds a connection/collection-level source scope ( ChangesConnection-level source scope for memory organization
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/memory_store/content/compose.rs (1)
632-873:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd test coverage for
path_scope-based tag seeding.All existing test chunks have
path_scope: None(lines 645, 726, 787, 860), so the new behavior introduced at lines 139-140 (usingpath_scopefor thesource/<slug>tag when present) is untested. Per coding guidelines and learnings, new behavior must ship with test coverage.Add a test case that:
- Creates a chunk with both
source_idandpath_scopeset (e.g.,source_id: "github:issue:456",path_scope: Some("github:conn123"))- Composes the chunk
- Asserts the
tags:block containssource/github-conn123(frompath_scope) and notsource/github-issue-456(fromsource_id)- Asserts the
source_id:field in front-matter still writes the originalsource_idvalue (for deduplication contract)🧪 Suggested test case
#[test] fn compose_chunk_with_path_scope_seeds_tag_from_scope() { let ts = chrono::Utc.timestamp_millis_opt(1_700_000_000_000).unwrap(); let chunk = Chunk { id: "test-scope-chunk".into(), content: "test body".into(), metadata: Metadata { source_kind: SourceKind::Chat, source_id: "github:issue:456".into(), // per-item ID owner: "alice@example.com".into(), timestamp: ts, time_range: (ts, ts), tags: vec![], source_ref: None, path_scope: Some("github:conn123".into()), // connection-level scope }, token_count: 2, seq_in_source: 0, created_at: ts, partial_message: false, }; let (full, _body) = compose_chunk_file(&chunk); let full_str = std::str::from_utf8(&full).unwrap(); // Tag must use path_scope, not source_id assert!( full_str.contains(" - source/github-conn123"), "tags must seed from path_scope when present; got:\n{full_str}" ); assert!( !full_str.contains(" - source/github-issue-456"), "tags must NOT seed from source_id when path_scope is present; got:\n{full_str}" ); // Front-matter source_id field must still write the original source_id (dedup key) assert!( full_str.contains("source_id: \"github:issue:456\""), "source_id field must preserve original value; got:\n{full_str}" ); }Based on learnings: Ship enough unit tests for the behavior you are adding or changing before building additional features on top of it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_store/content/compose.rs` around lines 632 - 873, New behavior that seeds the source/<slug> tag from Metadata.path_scope when present is untested; add a unit test that constructs a Chunk with both metadata.source_id (e.g., "github:issue:456") and metadata.path_scope = Some("github:conn123"), calls compose_chunk_file(&chunk), and asserts the front-matter tags block contains " - source/github-conn123" and does NOT contain " - source/github-issue-456", and also asserts the front-matter still contains the original source_id value ("source_id: \"github:issue:456\""); place the test alongside the other compose tests (use the same chrono timestamp pattern) and name it something like compose_chunk_with_path_scope_seeds_tag_from_scope so it clearly targets compose_chunk_file behavior.
🧹 Nitpick comments (4)
src/openhuman/memory_sync/composio/providers/github/ingest.rs (1)
195-200: ⚡ Quick winLog the derived connection scope around the ingest call.
This is the new behavior in the file, but the trace path still omits
source_id/path_scope, so a scope-grouping regression will be hard to diagnose from logs alone. Please add a trace/debug entry before theingest_document_with_scopecall and include the scope fields in the success/error path.Suggested logging shape
let tags: Vec<String> = DEFAULT_TAGS.iter().map(|s| s.to_string()).collect(); let owner = github_source_scope(connection_id); let path_scope = Some(owner.clone()); + tracing::trace!( + connection_id = %connection_id, + issue_id = %issue_id, + source_id = %source_id, + path_scope = %owner, + "[composio:github] ingest: calling ingest_document_with_scope" + ); match ingest_pipeline::ingest_document_with_scope( config, &source_id, &owner, tags, doc, path_scope, ) .await { @@ tracing::debug!( connection_id = %connection_id, issue_id = %issue_id, + source_id = %source_id, + path_scope = %owner, chunks_written, already_ingested, "[composio:github] ingest: issue persisted" );As per coding guidelines, "Add substantial, development-oriented
debugortracelogs on new/changed flows usinglog/tracingcrate with stable, grep-friendly prefixes" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_sync/composio/providers/github/ingest.rs` around lines 195 - 200, Add a debug/trace log right before calling ingest_pipeline::ingest_document_with_scope that logs the derived scope fields (owner from github_source_scope(connection_id)), source_id and path_scope so they appear in tracing output; then update the success and error branches around that call to include the same stable, grep-friendly log prefix and those fields (source_id, path_scope, owner/connection_id) so both success and failure paths record the scope for diagnostics.src/openhuman/memory_sync/composio/providers/notion/ingest.rs (1)
170-175: ⚡ Quick winInclude the connection scope in Notion ingest diagnostics.
This PR changes how pages are grouped on disk and in tags, but the changed call path still does not log the derived
path_scope. Please surface it in the call-boundary logs so path/tag mis-grouping can be debugged without inspecting SQLite or content files.Suggested logging shape
let tags: Vec<String> = DEFAULT_TAGS.iter().map(|s| s.to_string()).collect(); let owner = notion_source_scope(connection_id); let path_scope = Some(owner.clone()); + tracing::trace!( + connection_id = %connection_id, + page_id = %page_id, + source_id = %source_id, + path_scope = %owner, + "[composio:notion] ingest: calling ingest_document_with_scope" + ); match ingest_pipeline::ingest_document_with_scope( config, &source_id, &owner, tags, doc, path_scope, ) .await { @@ tracing::debug!( connection_id = %connection_id, page_id = %page_id, + source_id = %source_id, + path_scope = %owner, chunks_written, already_ingested, "[composio:notion] ingest: page persisted" );As per coding guidelines, "Add substantial, development-oriented
debugortracelogs on new/changed flows usinglog/tracingcrate with stable, grep-friendly prefixes" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_sync/composio/providers/notion/ingest.rs` around lines 170 - 175, Before calling ingest_pipeline::ingest_document_with_scope, emit a debug/trace log that includes the derived connection scope (owner from notion_source_scope), the computed path_scope, and the source_id so consumers can grep for the boundary and troubleshoot path/tag mis-grouping; specifically, log a clearly prefixed message (e.g., "notion:ingest:scope") that prints owner.clone(), path_scope (Some(owner.clone())), and &source_id immediately before the ingest_document_with_scope invocation in ingest.rs.src/openhuman/memory_sync/composio/providers/clickup/ingest.rs (1)
107-112: ⚡ Quick winLog the new ClickUp scope at the ingest boundary.
The connection-scoped grouping is the changed behavior here, but the trace/debug output still hides the derived
path_scope. Adding it to the external-call and success/error logs would make grouping regressions much easier to isolate.Suggested logging shape
let tags: Vec<String> = DEFAULT_TAGS.iter().map(|s| s.to_string()).collect(); let owner = clickup_source_scope(connection_id); let path_scope = Some(owner.clone()); + tracing::trace!( + connection_id = %connection_id, + task_id = %task_id, + source_id = %source_id, + path_scope = %owner, + "[composio:clickup] ingest: calling ingest_document_with_scope" + ); match ingest_pipeline::ingest_document_with_scope( config, &source_id, &owner, tags, doc, path_scope, ) .await { @@ tracing::debug!( connection_id = %connection_id, task_id = %task_id, + source_id = %source_id, + path_scope = %owner, chunks_written, already_ingested, "[composio:clickup] ingest: task persisted" );As per coding guidelines, "Add substantial, development-oriented
debugortracelogs on new/changed flows usinglog/tracingcrate with stable, grep-friendly prefixes" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_sync/composio/providers/clickup/ingest.rs` around lines 107 - 112, Log the derived ClickUp scope (path_scope) at the ingest boundary and include it in the external-call and success/error logs: call out the derived scope returned by clickup_source_scope(connection_id) (owner/path_scope) before calling ingest_pipeline::ingest_document_with_scope, and add trace/debug logging with a stable grep-friendly prefix that records the config, source_id, owner, and path_scope being passed; also ensure the result handling (success and error branches around ingest_pipeline::ingest_document_with_scope) logs the same path_scope alongside success/error details to aid grouping/regression debugging.src/openhuman/memory_sync/composio/providers/linear/ingest.rs (1)
100-105: ⚡ Quick winExpose
path_scopein the Linear ingest logs.The new connection-scoped behavior is only visible in persisted data right now. Adding
source_id/path_scopeto the pre-call and success/error logs will make scope-routing regressions much easier to trace.Suggested logging shape
let tags: Vec<String> = DEFAULT_TAGS.iter().map(|s| s.to_string()).collect(); let owner = linear_source_scope(connection_id); let path_scope = Some(owner.clone()); + tracing::trace!( + connection_id = %connection_id, + issue_id = %issue_id, + source_id = %source_id, + path_scope = %owner, + "[composio:linear] ingest: calling ingest_document_with_scope" + ); match ingest_pipeline::ingest_document_with_scope( config, &source_id, &owner, tags, doc, path_scope, ) .await { @@ tracing::debug!( connection_id = %connection_id, issue_id = %issue_id, + source_id = %source_id, + path_scope = %owner, chunks_written, already_ingested, "[composio:linear] ingest: issue persisted" );As per coding guidelines, "Add substantial, development-oriented
debugortracelogs on new/changed flows usinglog/tracingcrate with stable, grep-friendly prefixes" and "Log entry/exit, branches, external calls, retries/timeouts, state transitions, and errors with stable grep-friendly prefixes."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_sync/composio/providers/linear/ingest.rs` around lines 100 - 105, Add debug/trace logs around the call to ingest_pipeline::ingest_document_with_scope in ingest.rs so the new connection scope is visible: log a pre-call message that includes source_id and path_scope (owner.clone()) and the call context, and log success and error outcomes with the same stable, grep-friendly prefix showing source_id and path_scope; use the tracing/log crate at debug or trace level and include the variables owner (from linear_source_scope), source_id, and path_scope in the messages to make scope-routing regressions easy to trace.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/memory_store/content/compose.rs`:
- Around line 139-140: Add a debug/tracing log around the new tag-seeding branch
so it records which scope is used: import tracing::debug in this module, then
before calling with_source_tag use
meta.path_scope.as_deref().unwrap_or(&meta.source_id) (source_scope) and emit a
debug log with a stable prefix like "[memory-compose] seeding source tag" plus
fields scope=source_scope, source_id=meta.source_id and
using_path_scope=meta.path_scope.is_some(); then call
with_source_tag(source_scope, &meta.tags). This ensures source_scope,
meta.path_scope and with_source_tag are the correlated symbols referenced in the
log.
- Around line 139-140: The front-matter tag rewrite currently loses
connection-level scoping because compose.rs seeds source_scope from
meta.path_scope but does not persist path_scope into the front-matter; update
the code that serializes front-matter to include path_scope when
meta.path_scope.is_some(), and change
src/openhuman/memory_store/content/tags.rs::augment_with_source_tag_for_chunk to
read scan_fm_field(fm, "path_scope") first and use that value for the
source/<...> tag, falling back to scan_fm_field(fm, "source_id") if path_scope
is absent; also add a unit test that sets path_scope = Some("...") and verifies
the rewrite path preserves the connection-level source/<...> tag.
---
Outside diff comments:
In `@src/openhuman/memory_store/content/compose.rs`:
- Around line 632-873: New behavior that seeds the source/<slug> tag from
Metadata.path_scope when present is untested; add a unit test that constructs a
Chunk with both metadata.source_id (e.g., "github:issue:456") and
metadata.path_scope = Some("github:conn123"), calls compose_chunk_file(&chunk),
and asserts the front-matter tags block contains " - source/github-conn123" and
does NOT contain " - source/github-issue-456", and also asserts the
front-matter still contains the original source_id value ("source_id:
\"github:issue:456\""); place the test alongside the other compose tests (use
the same chrono timestamp pattern) and name it something like
compose_chunk_with_path_scope_seeds_tag_from_scope so it clearly targets
compose_chunk_file behavior.
---
Nitpick comments:
In `@src/openhuman/memory_sync/composio/providers/clickup/ingest.rs`:
- Around line 107-112: Log the derived ClickUp scope (path_scope) at the ingest
boundary and include it in the external-call and success/error logs: call out
the derived scope returned by clickup_source_scope(connection_id)
(owner/path_scope) before calling ingest_pipeline::ingest_document_with_scope,
and add trace/debug logging with a stable grep-friendly prefix that records the
config, source_id, owner, and path_scope being passed; also ensure the result
handling (success and error branches around
ingest_pipeline::ingest_document_with_scope) logs the same path_scope alongside
success/error details to aid grouping/regression debugging.
In `@src/openhuman/memory_sync/composio/providers/github/ingest.rs`:
- Around line 195-200: Add a debug/trace log right before calling
ingest_pipeline::ingest_document_with_scope that logs the derived scope fields
(owner from github_source_scope(connection_id)), source_id and path_scope so
they appear in tracing output; then update the success and error branches around
that call to include the same stable, grep-friendly log prefix and those fields
(source_id, path_scope, owner/connection_id) so both success and failure paths
record the scope for diagnostics.
In `@src/openhuman/memory_sync/composio/providers/linear/ingest.rs`:
- Around line 100-105: Add debug/trace logs around the call to
ingest_pipeline::ingest_document_with_scope in ingest.rs so the new connection
scope is visible: log a pre-call message that includes source_id and path_scope
(owner.clone()) and the call context, and log success and error outcomes with
the same stable, grep-friendly prefix showing source_id and path_scope; use the
tracing/log crate at debug or trace level and include the variables owner (from
linear_source_scope), source_id, and path_scope in the messages to make
scope-routing regressions easy to trace.
In `@src/openhuman/memory_sync/composio/providers/notion/ingest.rs`:
- Around line 170-175: Before calling
ingest_pipeline::ingest_document_with_scope, emit a debug/trace log that
includes the derived connection scope (owner from notion_source_scope), the
computed path_scope, and the source_id so consumers can grep for the boundary
and troubleshoot path/tag mis-grouping; specifically, log a clearly prefixed
message (e.g., "notion:ingest:scope") that prints owner.clone(), path_scope
(Some(owner.clone())), and &source_id immediately before the
ingest_document_with_scope invocation in ingest.rs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fdd957ba-7413-4d97-9abf-8598d3edde29
📒 Files selected for processing (12)
AGENTS.mdCLAUDE.mdsrc/openhuman/agent/triage/escalation.rssrc/openhuman/channels/proactive.rssrc/openhuman/memory_store/chunks/connection.rssrc/openhuman/memory_store/chunks/store.rssrc/openhuman/memory_store/chunks/store_tests.rssrc/openhuman/memory_store/content/compose.rssrc/openhuman/memory_sync/composio/providers/clickup/ingest.rssrc/openhuman/memory_sync/composio/providers/github/ingest.rssrc/openhuman/memory_sync/composio/providers/linear/ingest.rssrc/openhuman/memory_sync/composio/providers/notion/ingest.rs
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/openhuman/agent/triage/escalation.rs (1)
244-249: ⚖️ Poor tradeoffConsider refactoring the test guard to use registry mocking instead of conditional compilation.
The
#[cfg(test)]guard insidedispatch_target_agentmixes test concerns into production code. While safe (the prefix is unlikely to collide with real agent IDs), this pattern makes the function's contract less clear. A cleaner approach would be to mock or stub theAgentDefinitionRegistryin tests to returnNonefor the missing agent, allowing the production error path at line 268 to handle it naturally.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/agent/triage/escalation.rs` around lines 244 - 249, The #[cfg(test)] conditional inside dispatch_target_agent introduces test-only behavior; remove this test guard and the special-case return so dispatch_target_agent always follows the normal production path (letting the existing Err path for missing agent definitions run). Update tests that relied on the missing-agent- prefix to instead mock or stub AgentDefinitionRegistry (or the method that looks up definitions) so it returns None for the test agent id, or inject a test registry implementation into dispatch_target_agent's caller/context; ensure tests verify the same error is produced by the standard not-found branch rather than using the cfg(test) shortcut.src/openhuman/memory_tree/tree/rpc.rs (1)
1135-1150: ⚖️ Poor tradeoffEnvironment-dependent test reduces determinism.
The test now accepts either
"running"or"degraded"status based on provider availability in the test environment. While both outcomes are validated (terminal, non-syncing, reason check for degraded), this makes the test environment-dependent and could mask regressions if the environment changes.Consider one of these alternatives:
- Mock or stub the provider/embedder availability to force deterministic behavior.
- Split into two tests: one that validates the "running" path with a configured provider, and one that validates the "degraded" path without a provider.
- Use a test configuration flag to explicitly control which outcome is expected.
Based on learnings: tests should be deterministic and avoid hidden global state.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_tree/tree/rpc.rs` around lines 1135 - 1150, The current test's match on out.status.as_str() (checking "running" vs "degraded" and inspecting out.reason) is environment-dependent; make the test deterministic by controlling provider/embedder availability rather than accepting either outcome. Fix by changing the test harness that calls the code which produces out (the code that sets out.status/out.reason) to explicitly configure provider availability (e.g., set a test config, environment variable, or call the existing test helper to enable/disable the embedder) and then assert the single expected status ("running" when provider enabled, or "degraded" and the semantic-recall reason when disabled). Alternatively split the single test into two tests: one that forces provider present and asserts out.status == "running" and out.reason.is_none(), and one that forces provider absent and asserts out.status == "degraded" with the semantic recall message; update any setup/teardown to stub/moc k the provider accordingly so out.status/out.reason are deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/memory_store/content/compose.rs`:
- Around line 143-148: The debug log in content_store::compose that prints
meta.source_id and source_scope may leak PII; update the log::debug call in
src/openhuman/memory_store/content/compose.rs to wrap meta.source_id and
source_scope with redact() (e.g., redact(&meta.source_id),
redact(&source_scope)) and leave meta.path_scope.is_some() as-is so the boolean
is preserved; ensure you import or reference the existing redact function used
across the codebase and update the log invocation to pass the redacted values
instead of raw ones.
---
Nitpick comments:
In `@src/openhuman/agent/triage/escalation.rs`:
- Around line 244-249: The #[cfg(test)] conditional inside dispatch_target_agent
introduces test-only behavior; remove this test guard and the special-case
return so dispatch_target_agent always follows the normal production path
(letting the existing Err path for missing agent definitions run). Update tests
that relied on the missing-agent- prefix to instead mock or stub
AgentDefinitionRegistry (or the method that looks up definitions) so it returns
None for the test agent id, or inject a test registry implementation into
dispatch_target_agent's caller/context; ensure tests verify the same error is
produced by the standard not-found branch rather than using the cfg(test)
shortcut.
In `@src/openhuman/memory_tree/tree/rpc.rs`:
- Around line 1135-1150: The current test's match on out.status.as_str()
(checking "running" vs "degraded" and inspecting out.reason) is
environment-dependent; make the test deterministic by controlling
provider/embedder availability rather than accepting either outcome. Fix by
changing the test harness that calls the code which produces out (the code that
sets out.status/out.reason) to explicitly configure provider availability (e.g.,
set a test config, environment variable, or call the existing test helper to
enable/disable the embedder) and then assert the single expected status
("running" when provider enabled, or "degraded" and the semantic-recall reason
when disabled). Alternatively split the single test into two tests: one that
forces provider present and asserts out.status == "running" and
out.reason.is_none(), and one that forces provider absent and asserts out.status
== "degraded" with the semantic recall message; update any setup/teardown to
stub/moc k the provider accordingly so out.status/out.reason are deterministic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f9c95d95-62d0-4cda-8870-78a497a887e1
📒 Files selected for processing (4)
src/openhuman/agent/triage/escalation.rssrc/openhuman/memory_store/content/compose.rssrc/openhuman/memory_store/content/tags.rssrc/openhuman/memory_tree/tree/rpc.rs
Summary
metadata.path_scopeinmem_tree_chunksso extraction workers and reloaded chunks keep source grouping intact.source/...tags frompath_scopewhen present, preventing one raw source folder/tag per selected item.AGENTS.mdandCLAUDE.md.Problem
Composio selector providers were mixing document dedupe identity with source/archive identity. Notion used
notion:{connection_id}:{page_id}as the document source id, which caused every selected page to create a separate raw source folder and graph source. The same per-item pattern existed in GitHub issues/PRs, Linear issues, and ClickUp tasks.Solution
metadata.source_idvalues for dedupe and item-level ingest bookkeeping.metadata.path_scopewithingest_document_with_scopeto a stable collection identity, currently<provider>:<connection_id>for the affected Composio selectors.path_scopecolumn and migration formem_tree_chunks, and hydrate it back intoMetadataon reads.path_scopewhen present, falling back tosource_idfor legacy chunks.Submission Checklist
diff-cover) meet the gate enforced by.github/workflows/pr-ci.yml. Runpnpm test:coverageandpnpm test:rustlocally; PRs below 80% on changed lines will not merge. Local full coverage not run; CI enforces changed-line coverage. Initial CI exposed unrelated full-suite Rust flakes; follow-up commit 5448794 stabilizes those tests. CodeRabbit follow-up commit 1b25aaf preserves scoped source tags during extraction tag rewrites. CI follow-up commit 97abc5f stabilizes the Rust core coverage failures exposed by the full-suite order. Follow-up commit 287422e makes the triage failed-event assertions tolerate approval-denied and missing-agent failure paths. Follow-up commit 95b2d56 redacts source identifiers in the scoped tag debug log.## Related— N/A: no coverage matrix feature IDs affected.docs/RELEASE-MANUAL-SMOKE.md) — N/A: no release manual smoke surface changed.Closes #NNNin the## Relatedsection — N/A: no GitHub issue provided.Impact
This affects Rust memory ingest and memory chunk storage. Existing chunks continue to read with
path_scope = Noneand fall back to the oldsource_idbehavior. New selector ingests group user-visible raw source folders and Obsidian source tags under the stable selector scope. Existing bad raw folders are not migrated by this PR.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/memory-raw-source-identity95b2d564Validation Run
pnpm --filter openhuman-app format:checkviapnpm format:checkpnpm typecheckviapnpm compileafterpnpm install --frozen-lockfilerefreshed local node_modulesGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib memory_sync::composio::providers::notion::ingest::tests::ingest_page_writes_to_memory_treeGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib memory_sync::composio::providers::github::ingest::tests::ingest_issue_writes_to_memory_treeGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib memory_sync::composio::providers::linear::ingest::tests::ingest_issue_writes_to_memory_treeGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib memory_sync::composio::providers::clickup::ingest::tests::ingest_task_writes_to_memory_treeGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib upsert_persists_path_scopeGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib schema_has_content_path_and_content_sha256_columnsGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::memory_store::content::compose::tests::compose_persists_path_scope_and_seeds_scoped_source_tag -- --nocaptureGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::memory_store::content::tags::tests::update_chunk_tags_prefers_path_scope_for_source_tag -- --nocaptureGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib path_scope -- --nocaptureGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::agent::triage::escalation::tests::apply_decision_react_failure_publishes_failed_event -- --nocaptureGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::agent::triage::escalation::tests::apply_decision_escalate_failure_publishes_failed_event -- --nocaptureGGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::memory_tree::tree::rpc::tests::pipeline_status_reports_chunk_aggregates_after_ingest -- --nocapturecargo fmt --manifest-path Cargo.toml;GGML_NATIVE=OFF cargo check --manifest-path Cargo.tomlpnpm rust:check/cargo check --manifest-path app/src-tauri/Cargo.tomlValidation Blocked
command:GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --liberror:repo-wide run completed with10630 passed; 329 failed; 23 ignoredimpact:broad suite is not a clean signal for this patch; focused memory ingest/store tests listed above passed.Behavior Changes
Parity Contract
metadata.source_idremains for dedupe and source-ingested checks.path_scopefall back tosource_idwhen composing source tags.Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Bug Fixes
Documentation