feat(mcp): add memory navigation and entity discovery tools#1974
Conversation
Three new read-only tools on top of the surface from tinyhumansai#1760 / tinyhumansai#1790: - `tree.browse` → `openhuman.memory_tree_list_chunks`: paginated chunk listing with source / entity / time-window filters. Lets the LLM enumerate ("what's recent in my Gmail", "everything from last week about Alice") without inventing keywords for `memory.search`. - `tree.top_entities` → `openhuman.memory_tree_top_entities`: most- referenced canonical entities for discovery before drilling in with `tree.browse` (passing `entity_ids`) or `memory.search`. - `tree.list_sources` → `openhuman.memory_tree_list_sources`: distinct ingest sources with chunk counts and last activity, so the LLM can scope follow-up queries to sources that actually have data instead of fishing on dead ones. Same pipeline as the existing tools: `build_rpc_params` → `validate_controller_params` → `enforce_read_policy(Read)`, reusing the JSON-RPC error mapping from tinyhumansai#1790. Argument naming matches the controller schemas (`source_kinds`, `source_ids`, `entity_ids`, `since_ms`, `until_ms`, `query`, `k`/`limit`, `offset`); `k` is capped at MAX_LIMIT (50) for parity with `memory.search` / `memory.recall`. Refined from the original sketch in tinyhumansai#1908: `tree.search_entities` was proposed there but its backing `memory_tree_search_entities` lives in the retrieval module, not the controller registry. Pivoted to `tree.top_entities` for entity discovery; `tree.list_sources` replaces `integrations.list_connections` since the memory-tree source list answers the LLM's "what data do I have" question more directly than OAuth connection state. 13 new unit tests cover happy + failure paths per tool plus boundary cases (`k` at MAX_LIMIT, blank `kind`, non-array `source_kinds`, string-typed `since_ms`, blank array entries silently dropped).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds three read-only MCP tools (tree.browse, tree.top_entities, tree.list_sources), their tool specs and JSON input schemas, parameter-mapping and validation (including k→limit and new parsing helpers), unit tests covering accepted/rejected inputs, and expanded documentation/smoke-test expectations. ChangesMCP Tool Surface Extension
Sequence DiagramsequenceDiagram
participant Client
participant MCPServer
participant ControllerRPC
Client->>MCPServer: tools.call(tool="tree.browse", args)
MCPServer->>MCPServer: validate args, map k→limit, apply parsers
MCPServer->>ControllerRPC: openhuman.memory_tree_list_chunks(params)
ControllerRPC-->>MCPServer: results
MCPServer-->>Client: tool response (results)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
gitbooks/developing/mcp-server.md (1)
49-56:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix inconsistent smoke test example.
Line 49 states the response should include "all six tool names from
tools/list", but the example JSON at line 56 still shows only three tools in the array:{"tools":[{"name":"memory.search",...},{"name":"memory.recall",...},{"name":"tree.read_chunk",...}]}The example should be updated to show all six tools to match the text description.
📝 Proposed fix to show all six tools in the example
```text {"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2025-06-18","capabilities":{"tools":{}},"serverInfo":{"name":"openhuman-core","version":"<crate version>"},"instructions":"..."}} -{"jsonrpc":"2.0","id":2,"result":{"tools":[{"name":"memory.search",...},{"name":"memory.recall",...},{"name":"tree.read_chunk",...}]}} +{"jsonrpc":"2.0","id":2,"result":{"tools":[{"name":"memory.search",...},{"name":"memory.recall",...},{"name":"tree.read_chunk",...},{"name":"tree.browse",...},{"name":"tree.top_entities",...},{"name":"tree.list_sources",...}]}}</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@gitbooks/developing/mcp-server.mdaround lines 49 - 56, The smoke test
example is inconsistent: the text says the response must include
capabilities.tools from initialize and all six tool names from tools/list, but
the JSON example under the second response only shows three tools; update that
example so the second JSON line (the response with "id":2 and
"result":{"tools":[...]}) lists all six tool entries (e.g., include
memory.search, memory.recall, tree.read_chunk, tree.browse, tree.top_entities,
tree.list_sources) so it matches the description and still preserves the
two-line compact JSON output and the note that notifications/initialized is
separate.</details> </blockquote></details> </blockquote></details>🧹 Nitpick comments (2)
src/openhuman/mcp_server/tools.rs (2)
329-379: ⚡ Quick winAdd debug/trace instrumentation to the new param-mapping and parsing paths.
The new
tree.*branches and optional parsers add multiple branch/error paths without local diagnostics, which will make MCP/controller mismatch triage harder. Please add stable-prefixeddebug/tracelogs for branch entry, parse failures, and blank-entry drops (keys/counts only, no raw values/PII).Proposed logging pattern
"tree.browse" => { + log::debug!( + "[mcp_server] build_rpc_params tool=tree.browse arg_keys={:?}", + args.keys().collect::<Vec<_>>() + ); reject_unexpected_arguments(&args, TREE_BROWSE_ARGUMENTS)?; let mut params = Map::new(); @@ fn optional_string_array( args: &Map<String, Value>, key: &str, ) -> Result<Option<Vec<String>>, ToolCallError> { @@ - let mut out = Vec::with_capacity(items.len()); + let mut out = Vec::with_capacity(items.len()); + let mut dropped_blank = 0usize; @@ let trimmed = s.trim(); if trimmed.is_empty() { + dropped_blank += 1; continue; } out.push(trimmed.to_string()); } + if dropped_blank > 0 { + log::trace!( + "[mcp_server] tools/optional_string_array dropped_blank_entries key={} count={}", + key, + dropped_blank + ); + } Ok(Some(out)) }As per coding guidelines: "Add substantial, development-oriented logs (using
log/tracingatdebugortracelevel in Rust) to new/changed flows at entry/exit points, branch decisions, external calls, retries/timeouts, state transitions, and error handling paths."Also applies to: 432-516
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/mcp_server/tools.rs` around lines 329 - 379, The new tree.* branches (handlers for "tree.browse", "tree.top_entities", "tree.list_sources") lack development-level diagnostics; add tracing/log statements at debug/trace level around branch entry/exit, before/after calls to reject_unexpected_arguments and the optional_* parsers (optional_limit, optional_string_array, optional_non_empty_string, optional_i64, optional_u64), and on parse failures or when a parsed optional is skipped; use a stable message prefix (e.g., "mcp:tree:") and only emit non-PII metadata (key names and counts/flags, not raw values), and include context such as which branch (tree.browse/top_entities/list_sources) and which param was dropped or failed to parse so controller/MCP mismatch triage is possible.
611-909: 🏗️ Heavy liftSplit
tools.rsinto submodules to stay within repository size guidance.This module now mixes tool specs, schemas, RPC param translation, helpers, and a large test block in ~900+ lines. Please split it (for example
tools/specs.rs,tools/params.rs,tools/tests.rs) to keep ownership and reviewability manageable.As per coding guidelines: "Keep Rust source files to ≤ ~500 lines; split modules when growing larger."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/mcp_server/tools.rs` around lines 611 - 909, tools.rs is too large and should be split into smaller submodules; extract logical sections (specs, params/translation, schema/helpers, and tests) into separate files and re-export them so existing callers keep the same symbols. Create e.g. tools/specs.rs containing tool_specs, list_tools_result and any spec-related types; tools/params.rs containing build_rpc_params, RPC-translation helpers and DEFAULT_LIMIT/MAX_LIMIT; tools/errors_or_helpers.rs for ToolCallError and schema_for_rpc_method wrappers; and tools/tests.rs containing the #[cfg(test)] mod tests block; then update src/openhuman/mcp_server/tools.rs to declare mod specs; mod params; mod errors_or_helpers; #[cfg(test)] mod tests; and pub use the key symbols (tool_specs, list_tools_result, build_rpc_params, ToolCallError, DEFAULT_LIMIT, MAX_LIMIT, all::schema_for_rpc_method) so existing code compiles without call-site changes; ensure internal use paths are adjusted (e.g., replace self::... or super::... with correct module paths) and run cargo test to fix any visibility/import issues.🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. Inline comments: In `@gitbooks/developing/mcp-server.md`: - Around line 33-37: Add explicit default and cap information for the k parameter on both tree.browse and tree.top_entities: update the tree.browse parameter description to state that k defaults to 10 and is capped at 50, and update tree.top_entities to state its k default and cap to match the actual implementation (fetch the values from the tree.top_entities implementation and document them exactly); modify the doc lines describing tree.browse and tree.top_entities to include these defaults/caps so they are consistent with memory.search and memory.recall. --- Outside diff comments: In `@gitbooks/developing/mcp-server.md`: - Around line 49-56: The smoke test example is inconsistent: the text says the response must include capabilities.tools from initialize and all six tool names from tools/list, but the JSON example under the second response only shows three tools; update that example so the second JSON line (the response with "id":2 and "result":{"tools":[...]}) lists all six tool entries (e.g., include memory.search, memory.recall, tree.read_chunk, tree.browse, tree.top_entities, tree.list_sources) so it matches the description and still preserves the two-line compact JSON output and the note that notifications/initialized is separate. --- Nitpick comments: In `@src/openhuman/mcp_server/tools.rs`: - Around line 329-379: The new tree.* branches (handlers for "tree.browse", "tree.top_entities", "tree.list_sources") lack development-level diagnostics; add tracing/log statements at debug/trace level around branch entry/exit, before/after calls to reject_unexpected_arguments and the optional_* parsers (optional_limit, optional_string_array, optional_non_empty_string, optional_i64, optional_u64), and on parse failures or when a parsed optional is skipped; use a stable message prefix (e.g., "mcp:tree:") and only emit non-PII metadata (key names and counts/flags, not raw values), and include context such as which branch (tree.browse/top_entities/list_sources) and which param was dropped or failed to parse so controller/MCP mismatch triage is possible. - Around line 611-909: tools.rs is too large and should be split into smaller submodules; extract logical sections (specs, params/translation, schema/helpers, and tests) into separate files and re-export them so existing callers keep the same symbols. Create e.g. tools/specs.rs containing tool_specs, list_tools_result and any spec-related types; tools/params.rs containing build_rpc_params, RPC-translation helpers and DEFAULT_LIMIT/MAX_LIMIT; tools/errors_or_helpers.rs for ToolCallError and schema_for_rpc_method wrappers; and tools/tests.rs containing the #[cfg(test)] mod tests block; then update src/openhuman/mcp_server/tools.rs to declare mod specs; mod params; mod errors_or_helpers; #[cfg(test)] mod tests; and pub use the key symbols (tool_specs, list_tools_result, build_rpc_params, ToolCallError, DEFAULT_LIMIT, MAX_LIMIT, all::schema_for_rpc_method) so existing code compiles without call-site changes; ensure internal use paths are adjusted (e.g., replace self::... or super::... with correct module paths) and run cargo test to fix any visibility/import issues.🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID:
e2590296-a9fa-4e5d-abea-bb16e10b0e87📒 Files selected for processing (3)
gitbooks/developing/mcp-server.mdsrc/openhuman/mcp_server/protocol.rssrc/openhuman/mcp_server/tools.rs
| `memory.search` and `memory.recall` accept `query` plus optional `k` (default | ||
| 10, capped at 50). `tree.read_chunk` accepts `chunk_id`. | ||
| 10, capped at 50). `tree.read_chunk` accepts `chunk_id`. `tree.browse` | ||
| accepts optional `source_kinds`, `source_ids`, `entity_ids`, `since_ms`, | ||
| `until_ms`, `query`, `k`, and `offset`. `tree.top_entities` accepts optional | ||
| `kind` and `k`. `tree.list_sources` accepts an optional `user_email_hint`. |
There was a problem hiding this comment.
Document k parameter defaults and caps for tree.browse and tree.top_entities.
The parameter descriptions explicitly state defaults and caps for k in memory.search and memory.recall but omit them for tree.browse and tree.top_entities, which also accept k. For consistency and clarity, please document the defaults and caps for these tools as well.
According to the PR summary, tree.browse uses "k defaults to 10, is capped at 50" (same as memory.search), while tree.top_entities has its own cap.
📝 Proposed fix for consistent parameter documentation
-`memory.search` and `memory.recall` accept `query` plus optional `k` (default
-10, capped at 50). `tree.read_chunk` accepts `chunk_id`. `tree.browse`
-accepts optional `source_kinds`, `source_ids`, `entity_ids`, `since_ms`,
-`until_ms`, `query`, `k`, and `offset`. `tree.top_entities` accepts optional
-`kind` and `k`. `tree.list_sources` accepts an optional `user_email_hint`.
+`memory.search` and `memory.recall` accept `query` plus optional `k` (default
+10, capped at 50). `tree.read_chunk` accepts `chunk_id`. `tree.browse`
+accepts optional `source_kinds`, `source_ids`, `entity_ids`, `since_ms`,
+`until_ms`, `query`, `k` (default 10, capped at 50), and `offset`.
+`tree.top_entities` accepts optional `kind` and `k` (default and cap: 50).
+`tree.list_sources` accepts an optional `user_email_hint`.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@gitbooks/developing/mcp-server.md` around lines 33 - 37, Add explicit default
and cap information for the k parameter on both tree.browse and
tree.top_entities: update the tree.browse parameter description to state that k
defaults to 10 and is capped at 50, and update tree.top_entities to state its k
default and cap to match the actual implementation (fetch the values from the
tree.top_entities implementation and document them exactly); modify the doc
lines describing tree.browse and tree.top_entities to include these
defaults/caps so they are consistent with memory.search and memory.recall.
Addresses CodeRabbit review on tinyhumansai#1974: - `gitbooks/developing/mcp-server.md` — the smoke-test example response was still showing only the 3 Phase-1 tools while the surrounding text already mentioned "all six tool names from `tools/list`". Extended the example to enumerate all six so the doc and example agree. - `tools.rs::optional_string_array` — log a `trace`-level line when blank entries are dropped so a downstream "the filter didn't match anything" bug is triagable from logs alone. Per the CLAUDE.md verbose-diagnostics rule for new/changed flows.
|
Addressed CodeRabbit review in
|
|
Hey hey @justinhsu1477 nice to meeta ya! great PR just resolve the merge conflicts and then good to merge. |
# Conflicts: # src/openhuman/mcp_server/protocol.rs # src/openhuman/mcp_server/tools.rs
|
Merged upstream/main (
|
|
@senamakel Thanks for the speedy merge and the welcome — super fun to work on this !! |
Summary
tree.browse— paginated chunk listing with source / entity / time-window / keyword filters.tree.top_entities— most-referenced canonical entities (people, organizations, topics, emails) for discovery.tree.list_sources— distinct ingest sources with chunk counts and last-activity timestamps.All three are read-only, route through the existing controller registry +
enforce_read_policy(Read), and reuse the JSON-RPC error mapping from #1790.Problem
The MCP surface from #1760 / #1790 handles "I know roughly what I'm looking for, find it" via
memory.search/memory.recallplustree.read_chunk. Three gaps show up the moment an MCP client tries to be useful beyond direct keyword recall — they're the gaps tracked in #1908:memory.search/memory.recalland hope the query is right. "What's recent in my Gmail?" forces the LLM to invent keywords instead of paging through.memory.searchagainst keywords that have no chance of matching anything.Solution
tree.browseopenhuman.memory_tree_list_chunkskfollows the Phase 2a contract (default 10, cap 50, reject on overflow). MCP-sidekis renamed to the controller'slimitinbuild_rpc_params.tree.top_entitiesopenhuman.memory_tree_top_entitieskat MAX_LIMIT before passing to the controller (whoselimitis required).tree.list_sourcesopenhuman.memory_tree_list_sourcesuser_email_hintpasses through unchanged.Argument names mirror the controller schemas (
source_kinds,source_ids,entity_ids,since_ms,until_ms,query,k,offset,kind,user_email_hint). Eachtools/callpath runs throughbuild_rpc_params→validate_controller_params→enforce_read_policy(ToolOperation::Read)just like the existing tools.Four new helpers in
tools.rscover the optional-typed-argument plumbing the new tools need:optional_non_empty_string,optional_string_array,optional_i64,optional_u64. Each rejects type-mismatched input asInvalidParamswith a message that names the offending key.Refinement vs the original sketch in #1908:
tree.search_entitieswas originally proposed butmemory_tree_search_entitieslives only in the retrieval module — it isn't a registered controller method. Pivoted totree.top_entitiesfor entity discovery;tree.list_sourcesreplacesintegrations.list_connectionssince the memory-tree source list answers the LLM's "what data do I have" question more directly than OAuth connection state.Submission Checklist
tools.rsplus thetools_list_returns_curated_toolsassertion inprotocol.rsupdated to the new 6-tool surface.cargo test --lib mcp_server::runs 33 tests (0 failed). Each new tool has happy-path + at least one rejection path; helpers covered indirectly by the tool-level tests.N/A: behaviour-only extension of existing row 11.1.4## RelatedN/A: opt-in MCP server, not on release-cut pathCloses #NNNin the## Relatedsection —Closes #1908Impact
openhuman-core mcpsubcommand only. No HTTP RPC, web, or mobile impact. Tool count goes from 3 → 6.tools/liston next launch. No change to the existing 3 tools' behavior.tree.browseproxies to the controller's existing paginated list — same DB shape, capped at 50 rows per page from the MCP side. No new indexes / no migrations.SecurityPolicy::enforce_tool_operation(ToolOperation::Read, …)like the existing surface.Related
11.1.4(MCP stdio server)AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
pnpm --filter openhuman-app format:check— N/A (no JS/TS changes)pnpm typecheck— N/A (no JS/TS changes)cargo test --lib mcp_server::— 33 passed, 0 failedcargo fmt --checkclean;cargo check --libcompiles;cargo clippy --lib --no-depsno mcp_server warningsapp/src-taurichanges)Validation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
tools/listreturns 6 tools instead of 3 (addstree.browse,tree.top_entities,tree.list_sources).Parity Contract
memory.search,memory.recall,tree.read_chunktools unchanged. All Phase 1/2a tests still pass (20/20 of the prior tests + 13 new).build_rpc_paramsadds three new arms (no fall-through change);enforce_read_policyreused unchanged.Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Documentation
Bug Fixes / Behavior