feat(mcp): per-tool database parameter and persist shorthand#31
Merged
StefanSteiner merged 7 commits intoMay 25, 2026
Merged
Conversation
Add the core primitives that tool handlers will use to route operations to the persistent database or user-attached databases: - IngestOptions gains `target_db: Option<String>` field - `qualified_table(opts)` helper builds fully-qualified SQL identifiers - `Engine::scoped_search_path(alias)` RAII guard for user-SQL routing - `Engine::create_table_in(name, cols, replace, target_db)` for qualified DDL - `HyperMcpServer::resolve_db(engine, database, persist, require_writable)` resolves persist sugar, filters LOCAL_ALIAS, validates writable All existing call sites pass `target_db: None` — no behavior change.
Wire the per-tool database routing through load_data and load_file: - LoadDataParams, LoadFileParams gain `database: Option<String>` and `persist: Option<bool>` parameters - LoadFilesParams gains the fields but rejects non-primary targets (pool architecture limitation; use load_file instead) - ingest_json, ingest_csv, ingest_csv_file: SQL now uses qualified_table(opts) for INSERT/COPY statements - ingest_arrow.rs: build_parquet_ingest_sql accepts target_db; Arrow IPC uses scoped_search_path for binary COPY protocol - Merge mode + non-primary database rejected with clear error - Catalog updates only fire for primary/persistent targets The `persist: true` flag is syntactic sugar for `database: "persistent"`. If both are set, `database` wins. Ephemeral-only mode + persist returns InvalidArgument.
…export Wire per-tool database routing through the read/write tools: - QueryParams, ExecuteParams, ChartParams, SampleParams, DescribeParams, ExportParams gain `database: Option<String>` - query/execute/chart use scoped_search_path RAII guard for user-provided SQL. The guard restores search_path on drop (even on error) and logs warnings if restore fails. - execute uses require_writable=true (other read-only tools pass false) - New Engine::describe_tables_in / describe_table_in / sample_table_in query the target database via fully-qualified pg_catalog references (already validated working in two_db_model_tests) - export with database+table mode synthesizes a fully-qualified SELECT; with database+sql mode redirects search_path
- watch_directory rejects `database` and `persist` parameters with a clear InvalidArgument error explaining the pool architecture limitation. Users targeting persistent should use load_file with persist:true instead. - New tests/per_tool_database_tests.rs (15 tests) covers: - qualified_table() helper output and quote escaping - Engine::resolve_target_db semantics (None/empty/whitespace/persistent) - InvalidArgument when persistent requested in ephemeral-only mode - ingest_json / ingest_csv routing to persistent via target_db - persistent ingest survives engine recreate - default routing still lands in primary (backward-compat) - describe_tables_in / describe_table_in / sample_table_in - ScopedSearchPath redirect + restore behavior
- README: New "Working with both databases" subsection presents the ergonomic per-tool `database` / `persist` parameters as the preferred pattern, with fully-qualified SQL kept for power users. Documents the v1 limitations on load_files / watch_directory / merge mode. - CHANGELOG: Adds per-tool database parameter and persist shorthand under [Unreleased] with the explicit limitations section so users know which paths still require fully-qualified SQL.
Reconciled findings from both adversarial reviewers on the integrated diff: CRITICAL fixes: - Async ingest paths (ingest_csv_file_async, ingest_json_async, create_table_async, ingest_arrow_ipc_file_async) now correctly qualify SQL when `target_db` is set. The bugs were dormant — the tools that call them (load_files, watch_directory) reject non-primary targets in v1 — but the asymmetry would have surfaced silently when the pool architecture catches up. Sync and async paths are now symmetric. - export with format="hyper" + database now returns InvalidArgument. export_hyper always snapshots the primary database; silently accepting `database` here would produce a wrong .hyper file. MEDIUM fixes: - resolve_target_db now matches PERSISTENT_ALIAS case-insensitively, matching the existing LOCAL_ALIAS treatment. "Persistent", "PERSISTENT", "persistent" all canonicalize to the lowercase form. - README and CHANGELOG no longer list query_data/query_file as accepting `database` (those tools materialize inline data into their own temp table; the parameter wouldn't be meaningful). Test additions: - resolve_target_db_persistent_is_case_insensitive (4 variants) - resolve_target_db_persistent_uppercase_errors_in_ephemeral_only
6 tasks
StefanSteiner
added a commit
that referenced
this pull request
May 25, 2026
…se catalog (#32) * fix(tests): eliminate TestDaemon port-pick TOCTOU race The daemon_mode_persistent_engine_data_is_queryable test (and others using TestDaemon) periodically failed in CI with a 30s startup timeout. Root cause: find_free_port() bound port 0, read the assigned port, *dropped the listener*, then set HYPERDB_DAEMON_PORT for the daemon thread to bind that same port a moment later. Between drop and rebind another process could grab the port — daemon's bind would fail, run_daemon would Err out silently, and the test would hit the 30s timeout instead of failing fast with a useful message. Fix: pass port 0 to DaemonConfig and let HealthListener::bind(0) pick a free port itself. The actual port is reported back via the discovery file (info.health_port), which is what tests already read. The HYPERDB_DAEMON_PORT env var only matters for the spawn-from-scratch client path; once the daemon writes the discovery file, clients use its health_port directly. We still set HYPERDB_DAEMON_PORT="0" so the two `resolve_port_*` tests that explicitly inspect the env-var resolution path keep their isolation. Also: capture the daemon thread JoinHandle and check is_finished() on each iteration of the readiness loop, so future bind/spawn errors fail fast (~ms) with the real error rather than after the 30s timeout. * test(mcp): pre-flight cross-DB DML smoke battery Iter 0 of the "remove v1 limitations" plan. Verifies Hyper supports the five cross-database SQL shapes the merge-in-target-DB design depends on, before any plumbing work begins: 1. CREATE TABLE "alias"."public"."tmp" AS SELECT ... 2. DELETE FROM "alias"."public"."t" t USING "alias"."public"."tmp" s WHERE t.k = s.k 3. INSERT INTO "alias"."public"."t" SELECT * FROM "alias"."public"."tmp" 4. ALTER TABLE "alias"."public"."t" ADD COLUMN c TEXT 5. Qualified pg_catalog.{pg_tables,pg_attribute,pg_type,pg_class, pg_namespace} probes for table_exists_in / column_metadata_in. All five pass against a writable user-attached database, so the merge path's plan to keep the temp table in the target DB (rather than primary, which would require cross-DB DML) is viable. Marked #[ignore] so they only run on demand: cargo test -p hyperdb-mcp --test cross_db_dml_smoke -- --ignored * feat(mcp): cross-database merge mode Lifts the v1 limitation that rejected `load_file` with mode="merge" + a non-primary `database` parameter. The merge path now keeps the temp table inside the same database as the target — eliminating cross-DB DML entirely (the temp and final table share a DB) and unlocking merges into persistent or any user-attached writable DB. Engine helpers (engine.rs): - table_exists_in(target_db, table) — qualified pg_tables probe - column_metadata_in(target_db, table) — qualified pg_attribute join via the existing describe_columns_via_pg_catalog helper - alter_table_add_columns_in(target_db, table, cols) — qualified ALTER Each falls back to the un-suffixed Catalog-API path when target_db is None, preserving the fast path for unqualified merges. Merge path (ingest.rs): - tmp_opts inherits opts.target_db so the temp lands in the target DB - TempTableGuard records target_db and qualifies its DROP - All five SQL sites (table_exists, column_metadata×3, alter_table_add_columns) route through the new *_in variants - DELETE-USING and INSERT-SELECT use qualified_table against the target; RENAME TO emits "alias"."public"."tmp" RENAME TO "newname" (qualified source, unqualified target — PostgreSQL/Hyper semantics) Server (server.rs): - Drops the InvalidArgument rejection in load_file's handler - Removes the matching note from SetTableMetadataParams' doc-comment Tests (per_tool_database_tests.rs): - merge_into_persistent_creates_table_when_missing — rename path - merge_into_persistent_replaces_matching_and_appends_new — DELETE-USING + INSERT-SELECT path - merge_into_persistent_alters_when_incoming_has_new_column — ALTER ADD COLUMN path with NULL backfill The pre-flight smoke battery (cross_db_dml_smoke.rs, Iter 0) verified all five SQL shapes work against an attached writable DB before any plumbing. * feat(mcp): export format=hyper supports source_db Lifts the v1 limitation that rejected `export(format="hyper", database=X)` for non-primary X. The hyper-format export now snapshots whichever database the caller named, instead of always snapshotting the primary ephemeral workspace. ExportOptions gains `source_db: Option<String>`. `export_hyper` and `populate_export_target` thread it through to the cross-DB `CREATE TABLE AS SELECT` (one per user table per schema), with `None` falling back to `engine.primary_db_name()` so existing single-DB callers keep working unchanged. Server handler drops the up-front InvalidArgument rejection. The underlying cross-DB CTAS shape was verified by Iter 0's smoke battery before any plumbing. New test: `export_hyper_with_source_db_snapshots_persistent` writes distinct tables into primary and persistent, exports with `source_db=Some("persistent")`, opens the snapshot fresh, and asserts only the persistent table appears — primary's table is excluded as expected. Existing `ExportOptions { ... }` literals across export_tests.rs and integration_tests.rs each gain `source_db: None` for the preserved-behavior callers. * feat(mcp): per-target watcher pool + load_files persist + detach guard Lifts the v1 limitations that rejected `database`/`persist` on `load_files` and `watch_directory`. Both tools now build their connection pool against whichever .hyper file the resolved target points at — primary, persistent, or any user-attached writable database. Adds a guard so `detach_database` refuses to detach an alias an active watcher is using. Design choice: instead of qualifying SQL by alias on the pool's connections, the pool opens the target's .hyper file directly as its workspace. From those connections' perspective the target IS the primary database, so SQL stays unqualified. The user-supplied alias is used for path resolution and the catalog-skip decision only. watcher.rs: - New resolve_pool_workspace(eng, attachments, target_db) returns the file path the pool should open. Rejects unattached aliases and read-only attachments up front. - build_watcher_pool / rebuild_watcher_pool gain attachments + target_db parameters. - WatcherHandle gains target_db: Option<String> so detach can inspect it; also surfaced in the watcher status JSON. - start_watching gains attachments + target_db. server.rs: - load_files and watch_directory drop their up-front rejections. load_files resolves the target once and inlines the workspace- path resolution (mirrors resolve_pool_workspace) so it can build the pool against the right file. watch_directory threads the resolved target into start_watching. - detach_database enumerates active watchers and refuses to detach an alias any of them target. Error message names the directory to unwatch_directory first. - after_ingest_catalog_update gains target_db: Option<&str> and short-circuits non-persistent user-attached targets at debug level (per-DB catalog ships in a follow-up iter). - Doc-comments on LoadFilesParams / WatchDirectoryParams updated to reflect the new behavior. tests/watcher_tests.rs: - Bulk-update existing start_watching call sites (5) to pass Arc::new(AttachRegistry::new()) and target_db: None. - New watcher_ingests_into_persistent_target verifies a watcher with target_db=Some("persistent") writes rows visible at "persistent"."public"."events" but NOT in primary. This iteration removes 3 of the 4 documented v1 limitations; the remaining one (per-database _table_catalog) is the subject of the next iter. * feat(mcp): per-database _table_catalog Closes the catalog gap surfaced in PR #31 review. Every writable database (persistent + user-attached) now gets its own _table_catalog, lazily seeded on first ingest. The iter-3 placeholder that short-circuited non-persistent targets in after_ingest_catalog_update is removed. Pre-flight smoke tests (cross_db_dml_smoke.rs) verified that Hyper rejects PRIMARY KEY ("Index support is disabled"), ALTER TABLE ADD CONSTRAINT ("named constraints not implemented"), and INSERT … ON CONFLICT ("syntax error: got ON, expected FETCH, FOR, LIMIT, OFFSET"). The original plan for atomic upsert backed by a PK is therefore not viable. The DELETE+INSERT-in-transaction pattern stays. Within one MCP server engine, every catalog write runs inside HyperMcpServer::with_engine which holds the engine mutex for the duration, so concurrent ingests serialize at the catalog write boundary even though data writes run in parallel through the watcher pool's separate connections. Cross-process concurrency on the same .hyper file is out of scope. Three rejection tests (`expect_err`-asserting) document Hyper's limitations and serve as a regression alarm if a future Hyper release adds support. table_catalog.rs: - New *_in family: ensure_exists_in, upsert_stub_in, set_metadata_in, get_in, list_in, delete_for_in, reconcile_in, each taking target_db: Option<&str>. - Old names kept as 1-line wrappers passing None. - qualified_catalog_in resolves None/empty/"persistent" → persistent, Some(alias) → that alias verbatim. Case-insensitive on the persistent special-case. - resolve_catalog_alias is the alias-only sibling for callers that build their own SQL. - Internal helpers (user_tables, table_present, row_count_of, refresh_row_count) all gain _in variants. - set_metadata_in error message names the target DB when the target is a non-persistent alias. engine.rs: - catalog_present_cache: Mutex<Option<bool>> → Mutex<HashMap<String, bool>> keyed by lowercased alias. - catalog_present_in_persistent → catalog_present_in(alias, prober). - mark_catalog_present → mark_catalog_present_for(alias). - NEW clear_catalog_cache_for(alias). server.rs: - after_ingest_catalog_update routes via upsert_stub_in(target_db). - detach_database calls clear_catalog_cache_for inside with_engine on successful detach so a re-attach with different file/writability doesn't reuse a stale cache entry. attach.rs: - on_missing=create path uses ensure_exists_in(engine, Some(alias)) (the deprecated wrapper still works but the new name is preferred). tests: - cross_db_dml_smoke.rs: 3 new #[ignore]'d tests that document Hyper's rejection of PK / ADD CONSTRAINT / ON CONFLICT. - per_tool_database_tests.rs: 4 new tests covering ensure_exists_in + upsert_stub_in + get_in + reconcile_in routing into a user-attached writable DB. Each verifies the row lands in the user-attached catalog, NOT in persistent's. - two_db_model_tests.rs: catalog_presence_probe_is_cached rewritten for the HashMap variant; new clear_catalog_cache_for_invalidates_ alias_entry test. This is breaking at the engine API surface (catalog_present_in_* methods renamed; cache shape changed). Pre-1.0, deliberate. * feat(mcp): set_table_metadata accepts database parameter Closes the user-attached friendliness gap surfaced in PR #31 review. set_table_metadata now accepts a `database: Option<String>` field that routes the catalog write to that database's _table_catalog instead of always targeting persistent's. The handler resolves via `resolve_db(require_writable=true)`, so a read-only attachment is rejected up front with a clear "re-attach with writable:true" message — before any catalog write is attempted. Threads through to `table_catalog::set_metadata_in(target_db)` (landed in the prior iter), which lazily seeds the per-DB _table_catalog if absent. Works seamlessly with the recently- landed cross-DB ingest paths: load_data into a user-attached DB auto-stubs the catalog row, then set_table_metadata updates prose fields in that same DB's catalog. Two new tests: - set_metadata_in_routes_to_user_attached_db verifies the prose update lands in the user-attached DB's _table_catalog (visible via qualified SELECT) and not in persistent's. - set_metadata_in_missing_row_names_target_database_in_error asserts the TableNotFound message names the target DB so the LLM gets actionable guidance ("load it via load_data with database='bar' first") instead of a confusing missing-row error. * docs: update README + CHANGELOG for v1-limitations removal README: - The `database` parameter is now listed for `load_files`, `watch_directory`, and `set_table_metadata` (added in iters 3-5). - The `persist: true` shorthand list expanded to include `load_files` and `watch_directory`. - "v1 limitations" paragraph removed. - Per-database `_table_catalog` semantics documented: every writable database (persistent + user-attached) gets its own catalog, lazily seeded on first ingest. Pristine `.hyper` recipe updated to use `<alias>` since persistent is no longer the only option. - New "Detach safety" note documenting the watcher-active rejection on `detach_database`. CHANGELOG [Unreleased]: - "Limitations (deferred to a follow-up)" subsection removed. - New entries for cross-database merge, export-hyper source_db, load_files/watch_directory persist, per-database _table_catalog, detach guard, and set_table_metadata.database. - New "Changed (breaking — pre-1.0)" subsection lists the engine cache rename and the table_catalog deprecated wrapper. - "Performance" entry updated for the per-DB cache shape. * fix(mcp): post-merge reviewer findings (watcher catalog + side effects + docs) Final-sweep adversarial review surfaced six issues this commit fixes: C1 (CRITICAL): watcher ingest never updated _table_catalog. process_ready_with_recovery's success branch removed the .ready / data files and bumped stats but never called the catalog upsert. Effect: a watcher-driven ingest into persistent (or any user- attached writable target after iters 3-4) left the table un- catalogued. set_table_metadata against that table would error TableNotFound even though the data exists. Fix: take the engine mutex on the success branch and call upsert_stub_in with load_tool="watch_directory" + the resolved target_db. Errors are logged but never block the success bookkeeping. Test: watcher_ingests_into_persistent_target now also asserts the _table_catalog row exists with load_tool='watch_directory' after ingest. M1: set_metadata_in created _table_catalog as a side effect of returning EmptyData / TableNotFound errors. ensure_exists_in ran BEFORE the empty-fields validation and BEFORE the missing-row check, so a no-op or misguided call permanently mutated the target DB's schema. Fix: validate fields.is_empty() first; let get_in itself return None when the catalog table doesn't yet exist (no seed). Removes ensure_exists_in from set_metadata_in entirely — by the time we know we're writing, get_in has already returned an existing row, which means the catalog already exists. M2: the LLM-facing readme.rs (returned by get_readme) did not mention the persistent attachment, the database/persist tool parameters, or per-DB _table_catalog. From the model's perspective post-merge, the orientation document was stale. Added a "Workspace model" section listing the three database kinds (ephemeral / persistent / user-attached), the tools that accept database/persist, and the watcher / detach interaction. M3: stale "// short-circuits non-persistent targets (per-DB catalog ships in a follow-up iter)" comments at three sites in server.rs (load_data, load_file, load_files). Iter 4 is the follow-up; the helper now routes per-DB. Comments rewritten. m2: README.md had a byte-for-byte duplicated block of three sections (Daemon management / Recovery from hyperd crashes / Other behavioral flags) at lines 225-279. Pre-existing, surfaced during this review. Deleted the second copy. m5: top-level tool descriptions (the strings the LLM sees in tools/list) didn't mention the new database / persist capability on load_files, watch_directory, export, or set_table_metadata. The struct field doc-comments were correct but rmcp's JsonSchema-derived descriptions don't include them. Added a brief sentence to each of the four tool descriptions. Deferred to a follow-up PR (also surfaced in the same review): - M4: bootstrap reconcile only targets persistent; user-attached catalogs go stale over a long-lived session. Not introduced by this branch. - M5: alias canonicalization is case-sensitive in the registry but case-insensitive in the catalog cache. Latent footgun if a user attaches as "User_DB" and detaches as "user_db". Not introduced by this branch either.
Merged
8 tasks
StefanSteiner
added a commit
that referenced
this pull request
May 26, 2026
…ute reconcile, e2e harness (#33) * fix(mcp): canonicalize attach aliases to lowercase at attach time Pre-fix the registry stored aliases verbatim (case-sensitive `==`) while `Engine::catalog_present_cache` and the cache helpers all lowercased their keys. Reachable user scenario: 1. `attach_database(alias="User_DB", writable=true)` — registry stores `"User_DB"`. 2. `load_data(database="User_DB", ...)` — populates `"User_DB"."public"."_table_catalog"`. Cache key: `"user_db"`. 3. `detach_database(alias="user_db")` — registry says "no such alias" (case mismatch), returns `Ok(false)`. Cache stays populated under `"user_db"`. 4. `attach_database(alias="user_db", path=...)` to a different file — cache hit on `"user_db"` skips the existence probe. If the new file lacks `_table_catalog`, subsequent reads silently fail. Fix: lowercase at every alias boundary. - `AttachRegistry::attach` lowercases `req.alias` after `validate_alias`. - `AttachRegistry::get` and `detach` lowercase their input arg. - `Engine::resolve_target_db` lowercases non-persistent aliases so qualified SQL identifiers match the form `ATTACH DATABASE` used (Hyper is case-sensitive on quoted identifiers). - `qualified_catalog_in` and `resolve_catalog_alias` lowercase the user-attached branch. - `detach_database` watcher-active check simplifies from `eq_ignore_ascii_case` to `==` since both sides are now canonical. `validate_alias` is unchanged; error messages preserve user-typed casing for invalid input. Three new round-trip tests in `per_tool_database_tests.rs` cover: attach "User_DB" → list shows "user_db" + get("USER_DB") finds it; detach via differing-case alias succeeds; qualified catalog SQL written through user-typed mixed case lands in the canonical lowercase catalog row. Breaking change documented in CHANGELOG under "Changed (breaking — pre-1.0)". * fix(mcp): reconcile user-attached target's catalog after execute `after_execute_catalog_update` only reconciled persistent's `_table_catalog`. Raw DDL like `execute(database="foo", sql="DROP TABLE bar")` against a user-attached writable DB removed the table but left its stub row stranded in `"foo"."public"."_table_catalog"` indefinitely. Bootstrap reconcile and the post-execute reconcile both walked persistent only, so subsequent `describe(database="foo")` calls would keep listing the dropped table. Fix: thread the resolved `target_db` from the `execute` handler into `after_execute_catalog_update`, then reconcile persistent first and the user target if non-persistent. The `eq_ignore_ascii_case` guard on `PERSISTENT_ALIAS` is defensive — Iter 1 made `resolve_target_db` return canonical lowercase, but the helper handles a non-canonical string passed directly by anyone calling it in the future. The stale `// No-op in bare mode.` comment at the call site is rewritten to explain the new threading behavior; the `--bare` flag was removed in PR #32. New engine-level test `execute_drop_table_in_user_attached_reconciles_that_dbs_catalog` mirrors the post-fix reconcile pattern: stub a row in user_db's catalog, DROP the table via raw DDL, run both reconciles, and assert the stub is gone from user_db while persistent stays clean. CHANGELOG entry under "Fixed". * test(mcp): add end-to-end MCP test harness via tokio::io::duplex Adds a 9-test end-to-end harness that exercises tool calls through the full rmcp dispatch path — params deserialization, request-context plumbing, error mapping — that engine-level tests can't reach. Phase 3a investigation chose Option A (rmcp's own pattern from `tests/test_tool_macros.rs`): connect server + minimal `ClientHandler` across a `tokio::io::duplex(64KiB)` pair, then drive tools via the public `client.call_tool(CallToolRequestParams)` entry point. No faking of `Peer<RoleServer>` required. Adds `rmcp = { ..., features = ["client"] }` to dev-dependencies. Tests cover: Happy paths (PR #31 rejections lifted by PR #32): - tool_load_files_persist_via_router_now_works - tool_load_file_merge_database_now_works - tool_export_hyper_database_now_works - tool_watch_directory_persist_via_router_now_works PR #31 routing / rejection: - ephemeral_only_plus_persist_returns_invalid_argument - database_persistent_case_insensitive_routes_correctly (capital P) - persist_true_plus_database_local_lets_database_win Iter 4-5 paths: - tool_set_table_metadata_database_persistent - tool_detach_database_rejects_when_watcher_active Each test owns its own harness (server task + embedded hyperd via `with_no_daemon=true` + tempdir-backed persistent file), and the watcher tests `unwatch_directory` before shutdown so the background tokio task joins cleanly. Each happy-path test asserts BOTH the tool's success AND a downstream effect (rows in the right database, file produced) — the success flag alone could be true while the rows landed in the wrong place. Dropped from the originally-planned 13 tests: - case-insensitive `"Local"` (the persist+local-wins test already exercises case-insensitive Local resolution inside resolve_db). - readonly-attached set_table_metadata + user-attached set_table_metadata via router. Both are exercised at engine level in `per_tool_database_tests.rs`; the router layer just plumbs params through, no new behavior to verify. * fix(mcp): final-sweep follow-ups — copy_query canonicalize + execute reconcile gating Final pre-merge adversarial sweep against the integrated branch caught two cross-cutting issues that no individual iteration owned: 1. **CRITICAL — `copy_query` bypassed alias canonicalization.** Pre-fix, attaching as `"My_DB"` (which Iter 1's `AttachRegistry` now stores lowercased as `"my_db"`) followed by `copy_query(target_database="My_DB", ...)` rendered SQL referring to `"My_DB"."public"."t"` and failed with "database does not exist" because Hyper is case-sensitive on quoted identifiers. `copy_query` is the one tool that doesn't go through `resolve_db` / `Engine::resolve_target_db`, so the Iter 1 sweep missed it. Fix: lowercase `target_database` after the `LOCAL_ALIAS` filter so both the registry lookup AND `perform_copy` → `qualified_name` agree on the canonical form. New regression test `tool_copy_query_target_database_mixed_case_canonicalizes` in the E2E harness exercises the full attach-then-copy round-trip. 2. **MAJOR — `after_execute_catalog_update` ran on every `execute`, doubling under M4.** `reconcile_in` walks `pg_catalog.pg_tables` and runs `2N + 2` SQL round-trips for N existing tables. With Iter 2's M-target fan-out, `execute(database="user_db", ...)` did `4N + 4`. For a user with 50 tables in each of persistent + user_db, a single row-level UPDATE triggered ~204 catalog round-trips. INSERT / UPDATE / DELETE can't change the table set, so the reconcile is wasted work for those. Fix: gate the call on `is_structural_sql` — same predicate that already guards `notify_resource_list_changed` two lines later. Now the reconcile only fires for CREATE / DROP / ALTER / TRUNCATE / RENAME, which is the semantic justification for it in the first place. Also fixed two `#[expect(clippy::unused_self)]` reasons that said "kept for symmetry" but actually mean "method-call dispatch requires &self even though the body uses only the engine argument" — line-level reviewer flagged the reasons as misleading. CHANGELOG updated with both fixes; the structural-gate note rolls into the existing M4 entry. * fix(tests): bump TestDaemon startup timeout from 30s to 60s The user reported intermittent failures of `daemon_mode_two_engines_share_same_hyperd` on macOS: panicked at hyperdb-mcp/tests/daemon_tests.rs:818: TestDaemon did not start within 30 seconds Root cause: the outer wait was the same length as `HyperProcess::new`'s own internal 30s timeout for the hyperd-callback connection. Under macOS load (parallel tests, hyperd subprocess churn, system resource pressure), hyperd startup can approach or exceed that limit. When it does, the outer assertion fires before the inner timeout can return its actual error message — masking the real cause behind a generic "did not start" panic. Bumping the outer timeout to 60s ensures the inner timeout has room to surface a real error via the existing `daemon_handle.is_finished()` branch. If hyperd genuinely can't start, we'll see "daemon thread errored: <real reason>" instead of the unhelpful generic message. Also tightens the panic message to clarify the failure mode (daemon thread still running, no discovery file written) so the next reader knows whether to look at hyperd-startup vs. discovery-write paths.
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an optional
databaseparameter on every data-touching MCP tool plus apersist: trueshorthand on ingest tools, so the LLM can route operations to the persistent database (or any user-attached writable DB) without writing fully-qualified SQL.This is the deferred Iteration 6 from #29 (the ephemeral-primary + persistent-attached two-database model). Before this PR, the LLM had to reach for fully-qualified SQL like
INSERT INTO "persistent"."public"."t" ...to use persistent storage; now it can just passpersist: true(ordatabase: "persistent") onload_data/load_file.What's covered
query,execute,chartdatabase— temporarily redirectschema_search_pathfor user SQL via RAII guardload_data,load_filedatabase+persist— thread alias intoIngestOptions; ingest SQL is fully qualifieddescribe,sampledatabase— tool-built SQL uses fully-qualified"db"."public"."table"referencesexportdatabase— table-mode synthesizes qualified SELECT; sql-mode redirects search path;format=hyperrejects with InvalidArgumentwatch_directory,load_filesload_file(persist:true)insteadDesign
qualified_name/qualified_tableto produce"db"."public"."table"directly.Engine::scoped_search_path(alias)returns an RAII guard that SETs the search path on creation and restores to the primary on Drop. The engine mutex is held bywith_enginefor the entire closure, so concurrent tool calls are serialized.persist: trueis sugar fordatabase: "persistent". If both are set,databasewins. With--ephemeral-only,persist: truereturns InvalidArgument.copy_querypattern). User-attached databases manage their own metadata."persistent"(and the existing"local").v1 limitations (deferred to a follow-up)
load_filesandwatch_directoryrejectdatabase/persist— their connection pool is bound to the primary file. Workaround: useload_filewithpersist: true.load_filewithmode: "merge"rejects non-primarydatabase— the merge implementation uses a temp table and cross-database DML isn't yet verified.exportwithformat: "hyper"rejects non-primarydatabase—export_hypersnapshots the primary database connection; silently accepting the parameter would produce a wrong file.All three rejection paths return
InvalidArgumentwith explicit instructions on the workaround.Adversarial review
Both reviewers ran in parallel against the integrated diff and surfaced concrete issues that landed in the final commit:
Fast reviewer (3 critical): async ingest paths (
ingest_csv_file_async,ingest_json_async,create_table_async) were missing the SQL-qualification fixes that the sync paths got. Bugs were dormant behind theload_files/watch_directoryrejection guards but would have surfaced silently when the pool architecture catches up. Fixed in682148f.Deep reviewer (1 critical, 2 medium):
format=hyperexport silently exported the primary even withdatabaseset — fixed via tool-boundary rejection.query_data/query_fileas acceptingdatabase— they don't (they materialize inline data into a temp table). Docs corrected."Persistent"(case-insensitive) wasn't accepted while"Local"was.Engine::resolve_target_dbnow matches case-insensitively and canonicalizes to the lowercase form.Lower-severity findings (M3 search-path-restore-failure recovery, M5 catalog-tracked signal in tool response, L1–L6 polish) are tracked as follow-up items rather than blockers — they describe edge-case improvements rather than correctness gaps.
Test plan
cargo test -p hyperdb-mcp— full suite green, 17 new tests inper_tool_database_tests.rscargo clippy --workspace --tests -- -D warnings— cleancargo fmt --check— cleanload_data(table="x", data="[{\"a\":1}]", persist=true)→ row in persistentquery(sql="SELECT * FROM x", database="persistent")→ returns the rowdescribe(database="persistent")→ lists "x"export(format="hyper", path="/tmp/o.hyper", database="persistent")→ InvalidArgument with workaround instructionsload_files(persist=true, files=[...])→ InvalidArgument pointing atload_fileIteration breakdown (8 commits)
962eb2eIngestOptions::target_db,qualified_table(),ScopedSearchPathRAII,resolve_db()helper,Engine::create_table_in()ffb27b3load_data,load_fileacceptdatabase/persist; sync ingest SQL fully qualified;load_filesrejects non-primary; merge+database rejected74075c1query,execute,chart,describe,sample,exportacceptdatabase; newEngine::describe_tables_in/describe_table_in/sample_table_inquery via fully-qualifiedpg_catalog66186a4tests/per_tool_database_tests.rs(15 tests)5dfcec805a76f8682148fformat=hyperrejection, case-insensitive PERSISTENT_ALIAS, doc accuracy