feat(mcp): add SearXNG search tool#1988
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a configurable SearXNG MCP tool: new config/schema with env overrides, a reqwest-based SearXNG client that normalizes results, MCP tool-specs and RPC handlers, orchestrator registration, tests, and documentation updates. The tool is listed and registered only when enabled. ChangesSearXNG Self-Hosted Web Search Tool Integration
Sequence DiagramsequenceDiagram
participant Client as Agent/MCP Client
participant Protocol as tools_schemas / MCP call
participant Tool as SearxngSearchTool
participant Searxng as SearXNG Instance
Client->>Protocol: tools_searxng_search {query,categories,language,max_results}
Protocol->>Tool: parse args & call search()
Tool->>Searxng: GET /search?format=json&q=...&categories=...&language=...
Searxng-->>Tool: raw JSON response
Tool->>Tool: parse & normalize results -> {title,url,snippet,source}[]
Tool-->>Protocol: SearxngSearchResponse {query, results}
Protocol-->>Client: JSON-RPC success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/integrations/mod.rs (1)
1-5:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate module documentation to reflect mixed integration patterns.
The module documentation states that all tools "proxy through the backend API" and "never talk to external services directly." However, the newly added SearXNG integration calls a user-configured SearXNG endpoint directly (not through the backend API). This creates confusion about the architectural and security model of this module.
📝 Proposed documentation update
-//! Agent integration tools that proxy through the backend API. +//! Agent integration tools. //! -//! Each tool calls a backend endpoint (authenticated via JWT Bearer token) which -//! handles external API calls, billing, rate limiting, and markup. The client -//! never talks to external services directly. +//! Most tools proxy through the backend API (authenticated via JWT Bearer token) +//! which handles external API calls, billing, rate limiting, and markup. +//! Some integrations (e.g., SearXNG) call user-configured endpoints directly +//! when enabled via configuration.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/integrations/mod.rs` around lines 1 - 5, Module doc comment in src/openhuman/integrations/mod.rs incorrectly states all tools "proxy through the backend API" and "never talk to external services directly"; update that top-level documentation to mention mixed integration patterns and explicitly call out that some integrations (e.g., the SearXNG integration) may call user-configured endpoints directly rather than via the backend. Edit the module-level comment (the docstring around the existing description) to explain the two patterns (backend-proxied vs. client-direct for user-configured endpoints), reference the SearXNG integration by name to clarify its behavior, and include a brief note about the associated security/usage implications so readers are not misled.
🧹 Nitpick comments (2)
src/openhuman/tools/schemas.rs (1)
588-606: 💤 Low valueConsider aligning string array parsing with the MCP layer.
The
optional_string_arrayhelper here does not trim whitespace or filter blank entries, unlike the similar helper intools.rs(lines 645-686) which trims and drops empty strings. While category validation vianormalize_categorieswill catch invalid entries, whitespace-only categories like" "would pass validation here but fail later.For consistency, consider trimming entries and filtering blanks, or document the expected behavior difference.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tools/schemas.rs` around lines 588 - 606, The optional_string_array helper currently returns raw strings and will allow whitespace-only entries; update optional_string_array to mirror the MCP layer behavior by trimming each item and filtering out empty/blank strings before returning, i.e., in function optional_string_array iterate over items (in the existing map/collect path), call trim() on each string, skip entries that become empty after trimming, and return the filtered Vec<String>; reference normalize_categories/tools.rs's trimming-and-filtering logic to ensure consistent behavior with category validation downstream.gitbooks/developing/mcp-server.md (1)
26-42: 💤 Low valueConsider adding a conditional indicator in the tools table.
Line 26 lists
searxng_searchalongside always-available tools without a visual cue (e.g., asterisk, note column) that it appears only when SearXNG is enabled, even though line 34 correctly documents this behavior. This could improve scannability for users checking tool availability.📝 Optional table enhancement
Add a note column or footnote marker to indicate conditional tools:
| MCP tool | Backing RPC | Purpose | | --- | --- | --- | -| `searxng_search` | `openhuman.tools_searxng_search` | Search a configured self-hosted SearXNG instance. | +| `searxng_search`* | `openhuman.tools_searxng_search` | Search a configured self-hosted SearXNG instance. | | `memory.search` | `openhuman.memory_tree_search` | Keyword search over memory-tree chunks. | + +\* Only listed when enabled in config🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gitbooks/developing/mcp-server.md` around lines 26 - 42, The tools table currently lists searxng_search alongside always-available tools (memory.search, memory.recall, tree.*) without a visual cue that searxng_search is conditional; add a concise indicator (e.g., an asterisk next to `searxng_search` or a dedicated "Note" column) and include a matching footnote or inline note that "searxng_search is present only when SearXNG is enabled" so the table and the explanatory sentence about SearXNG are consistent; update the row for `searxng_search` and the table header/footnote text so readers scanning the table immediately see the conditional availability while keeping the existing descriptive sentence intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/integrations/searxng.rs`:
- Around line 307-312: The current assignment to snippet uses
item.content.or(item.snippet) which preserves a Some(content) even when that
content trims to an empty string, losing a non-empty snippet; change the logic
so you select item.content only if it is Some and its trimmed string is
non-empty, otherwise fall back to item.snippet, and finally to default. Locate
the snippet variable assignment and implement the conditional selection using
item.content and item.snippet (e.g., test trimmed content.is_empty() before
choosing it) so snippet ends up with a trimmed, non-empty value when available.
- Around line 360-369: Change the current tolerant parsing of language and
max_results so that presence with the wrong type produces an error instead of
being treated as missing: for language, inspect object.get("language"): if None
set language = None; if Some(value) then if value.as_str() yields a non-empty
trimmed string set language = Some(string) else return Err("invalid 'language'
parameter"); for max_results, inspect object.get("max_results"): if None set
max_results = None; if Some(value) then if value.as_u64() succeeds convert,
clamp to 1..=MAX_RESULTS and set max_results = Some(usize) else return
Err("invalid 'max_results' parameter"); update the code that constructs language
and max_results (the variables named language and max_results in this file) to
follow this logic and return a clear error on malformed inputs.
---
Outside diff comments:
In `@src/openhuman/integrations/mod.rs`:
- Around line 1-5: Module doc comment in src/openhuman/integrations/mod.rs
incorrectly states all tools "proxy through the backend API" and "never talk to
external services directly"; update that top-level documentation to mention
mixed integration patterns and explicitly call out that some integrations (e.g.,
the SearXNG integration) may call user-configured endpoints directly rather than
via the backend. Edit the module-level comment (the docstring around the
existing description) to explain the two patterns (backend-proxied vs.
client-direct for user-configured endpoints), reference the SearXNG integration
by name to clarify its behavior, and include a brief note about the associated
security/usage implications so readers are not misled.
---
Nitpick comments:
In `@gitbooks/developing/mcp-server.md`:
- Around line 26-42: The tools table currently lists searxng_search alongside
always-available tools (memory.search, memory.recall, tree.*) without a visual
cue that searxng_search is conditional; add a concise indicator (e.g., an
asterisk next to `searxng_search` or a dedicated "Note" column) and include a
matching footnote or inline note that "searxng_search is present only when
SearXNG is enabled" so the table and the explanatory sentence about SearXNG are
consistent; update the row for `searxng_search` and the table header/footnote
text so readers scanning the table immediately see the conditional availability
while keeping the existing descriptive sentence intact.
In `@src/openhuman/tools/schemas.rs`:
- Around line 588-606: The optional_string_array helper currently returns raw
strings and will allow whitespace-only entries; update optional_string_array to
mirror the MCP layer behavior by trimming each item and filtering out
empty/blank strings before returning, i.e., in function optional_string_array
iterate over items (in the existing map/collect path), call trim() on each
string, skip entries that become empty after trimming, and return the filtered
Vec<String>; reference normalize_categories/tools.rs's trimming-and-filtering
logic to ensure consistent behavior with category validation downstream.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2564aea9-cdd4-4277-a0ca-aa98efe6f696
📒 Files selected for processing (19)
.env.exampledocs/TEST-COVERAGE-MATRIX.mdgitbooks/developing/mcp-server.mdscripts/feature-ids.jsonsrc/openhuman/about_app/catalog.rssrc/openhuman/about_app/catalog_tests.rssrc/openhuman/config/mod.rssrc/openhuman/config/schema/load.rssrc/openhuman/config/schema/load_tests.rssrc/openhuman/config/schema/mod.rssrc/openhuman/config/schema/tools.rssrc/openhuman/config/schema/types.rssrc/openhuman/integrations/mod.rssrc/openhuman/integrations/searxng.rssrc/openhuman/mcp_server/protocol.rssrc/openhuman/mcp_server/tools.rssrc/openhuman/tools/ops.rssrc/openhuman/tools/ops_tests.rssrc/openhuman/tools/schemas.rs
graycyrus
left a comment
There was a problem hiding this comment.
Walkthrough
Clean, well-structured addition of a config-gated searxng_search tool. The implementation follows existing integration patterns (Seltz sibling), has good test coverage, and all prior CodeRabbit feedback has been addressed. Two issues worth fixing: the Closes #1842 claim is premature since agent routing/priority/fallback aren't implemented, and the RPC handler rebuilds an HTTP client on every invocation.
Change Summary
| File | Change type | Description |
|---|---|---|
.env.example |
Modified | Added SearXNG env vars |
docs/TEST-COVERAGE-MATRIX.md |
Modified | Added 11.1.6 row, bumped counts |
gitbooks/developing/mcp-server.md |
Modified | Added searxng_search to MCP tool table + config docs |
scripts/feature-ids.json |
Modified | Added 11.1.6 |
src/openhuman/about_app/catalog.rs |
Modified | Added SearXNG capability entry |
src/openhuman/about_app/catalog_tests.rs |
Modified | Added searxng to catalog test |
src/openhuman/config/mod.rs |
Modified | Re-export SearxngConfig |
src/openhuman/config/schema/load.rs |
Modified | Env override parsing for searxng |
src/openhuman/config/schema/load_tests.rs |
Modified | Tests for searxng config loading |
src/openhuman/config/schema/mod.rs |
Modified | Re-export SearxngConfig |
src/openhuman/config/schema/tools.rs |
Modified | SearxngConfig struct + defaults |
src/openhuman/config/schema/types.rs |
Modified | Added searxng field to Config |
src/openhuman/integrations/mod.rs |
Modified | Added searxng module + exports |
src/openhuman/integrations/searxng.rs |
New | Core SearXNG tool implementation (546 lines) |
src/openhuman/mcp_server/protocol.rs |
Modified | list_tools now async |
src/openhuman/mcp_server/tools.rs |
Modified | SearXNG tool spec, config-gated listing, build_rpc_params |
src/openhuman/tools/ops.rs |
Modified | Register searxng_search in all_tools_with_runtime |
src/openhuman/tools/ops_tests.rs |
Modified | Test searxng registration |
src/openhuman/tools/schemas.rs |
Modified | Controller schema + handler for searxng_search |
Per-file Analysis
src/openhuman/integrations/searxng.rs (new, 546 lines)
Well-written integration module. Follows the Seltz sibling pattern closely. Good error handling with anyhow::bail, proper logging with [searxng] prefix, trimming/validation on all inputs. The first_non_empty_trimmed helper elegantly solves the content/snippet fallback. Integration test with axum mock server is solid.
src/openhuman/tools/schemas.rs
The handle_searxng_search RPC handler reconstructs SearxngSearchTool (including a reqwest::Client) on every call. This is the main performance concern — see inline comment.
src/openhuman/mcp_server/tools.rs
Config-gated listing is correctly implemented. The list_tools_result gracefully degrades to base tools if config loading fails. build_rpc_params validates categories eagerly at the MCP boundary.
src/openhuman/mcp_server/protocol.rs
The test relaxation to contains checks reduces strictness — see inline comment.
Additional Findings
[major] PR uses Closes #1842 but does not implement acceptance criteria 5 (agent routing — agents with web-search skill auto-use searxng_search), 6 (priority logic — SearXNG over cloud search), or 7 (fallback handling). The PR itself is a valid, useful subset, but the issue link should be Relates to #1842 to avoid prematurely closing the issue. Alternatively, split criteria 5-7 into a follow-up issue.
[minor] Two separate constants define the same max-results cap: MAX_RESULTS: usize = 50 in integrations/searxng.rs and MAX_LIMIT: u64 = 50 in mcp_server/tools.rs. If one changes without the other, the MCP layer will reject at a different threshold than the tool layer clamps. Consider importing a single source of truth.
|
Addressed @graycyrus review in 9e5d367:
Validated locally with:
Pre-push checks also passed. |
|
@graycyrus I addressed the requested changes and CI is now green. Could you please re-review when you have a chance? |
Summary
searxng_searchtool for private, self-hosted web search through SearXNG.tools/list/tools/callsurfaces.Problem
Solution
openhuman::integrations::searxngwith a small HTTP client wrapper that calls SearXNG JSON search, enforces max result bounds, mapswebto SearXNGgeneral, skips invalid result rows, and logs grep-friendly checkpoints.[searxng]config plusOPENHUMAN_SEARXNG_*/SEARXNG_*env overrides. The feature is disabled by default.searxng_searchin optional agent tools,openhuman.tools_searxng_searchin the controller registry, and MCPsearxng_searchas a config-gated tool..env.example, the runtime capability catalog, feature IDs, and the coverage matrix.Submission Checklist
pnpm test:coveragewas not run locally, and CIdiff-coverremains authoritative.11.1.6 SearXNG MCP searchtodocs/TEST-COVERAGE-MATRIX.md.## Related.docs/RELEASE-MANUAL-SMOKE.md) — N/A: no release-cut manual smoke surface changed.Impact
searxng_searchis omitted fromtools/listunless enabled.Related
11.1.6AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
codex/OH-1842-searxng-mcp-tool9e5d367cb39923131f436fb2735465d34ab1200fValidation Run
pnpm --filter openhuman-app format:checkpnpm typecheckcargo test --manifest-path Cargo.toml searxng -- --nocapture;cargo test --manifest-path Cargo.toml mcp_server -- --nocapture;cargo test --manifest-path Cargo.toml all_tools_registers_optional_search_lsp_and_tool_stats_when_enabled -- --nocapture;cargo test --manifest-path Cargo.toml about_app::catalog -- --nocapture;pnpm debug rust searxngcargo fmt --manifest-path Cargo.toml --all --check;cargo check --manifest-path Cargo.toml;cargo build --manifest-path Cargo.toml --bin openhuman-corepnpm --filter openhuman-app rust:checkvia pre-push hook; Tauri shell unchanged.Validation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
searxng_searchas a config-gated agent/RPC/MCP tool backed by a private SearXNG instance.[searxng]can let agents search their configured self-hosted SearXNG endpoint from MCP-compatible clients.Parity Contract
tools/listomits config-gated tools on disabled or failed config load;tools/callrejects unknown args/categories and validates result limits before RPC dispatch.Duplicate / Superseded PR Handling
gh pr list --repo tinyhumansai/openhuman --state open --search "searxng OR SearXNG OR 1842".Summary by CodeRabbit
New Features
Documentation
Tests
Chores