feat(mcp): ephemeral-primary + persistent-attached two-database model#29
Merged
StefanSteiner merged 9 commits intoMay 25, 2026
Merged
Conversation
…t path Begins migrating hyperdb-mcp toward an ephemeral-primary, persistent-attached session model. Each engine now holds two .hyper files at all times (when not in --ephemeral-only mode): - Ephemeral primary: $TMPDIR/hyperdb-mcp-<pid>-<seq>/scratch.hyper Created fresh per-engine, deleted on Drop. The connection is bound here; unqualified SQL routes here. - Persistent attachment: at the platform-default location (or user-supplied path), attached under the reserved alias "persistent" during Engine::new. Survives across sessions. Per-engine sequence number lets multiple Engines coexist in the same PID (parallel test runners, restart-after-ConnectionLost) without colliding. New module: - paths: cross-platform resolution for the persistent-db default location using `dirs::data_dir()`. Override via HYPERDB_PERSISTENT_DB env var. Engine API changes: - workspace_path() -> ephemeral_path() and new persistent_path() - is_persistent() -> has_persistent() + persistent_was_just_created() - new resolve_target_db() resolves an optional database alias for tools - PERSISTENT_ALIAS = "persistent" const exposed - Drop always cleans up ephemeral; persistent stays where the user put it attach::reset_search_path now respects the always-on persistent attachment: re-pins to the primary's name instead of issuing RESET (which would leave the persistent attachment without a working unqualified-name resolver). Tests are all passing except for two clusters being addressed in later iterations: - saved_queries_tests (2 failures): WorkspaceStore still targets primary; iteration 7 will move it to the persistent attachment. - table_catalog_tests (2 failures): catalog ensure_exists still targets primary; iteration 5 will make it presence-conditional and per-DB. This is the second commit in a multi-step migration; --bare and --workspace CLI flags still work (deprecation comes in iteration 3).
CLI changes: - New `--persistent-db <PATH>`: replaces `--workspace`. Defaults to the platform data dir (or HYPERDB_PERSISTENT_DB env var). The deprecated `--workspace` is hidden but still accepted with a stderr warning; passing both is an error. - New `--ephemeral-only`: skip persistent attachment entirely. Saved queries fall back to in-memory storage. - Removed `--bare`. Catalog creation is now uniform (always seed when MCP creates a fresh .hyper file; never touch an existing file's catalog), so the opt-out flag became redundant. Users wanting a pristine .hyper for export can DROP TABLE _table_catalog after creation; subsequent opens won't recreate it. HyperMcpServer: - new() and with_no_daemon() drop the `bare` parameter. - is_bare() removed. - Saved-query store selection becomes "persistent if available, session otherwise" — same logic, but driven by --ephemeral-only instead of --bare. - AttachRegistry no longer takes a catalog policy; `seed_catalog_on_create` becomes the constant default. `_table_catalog` is now treated as an internal table by `is_internal_table` so it doesn't appear in user-visible `describe_tables` output or `total_rows`. The catalog's own `table_present`/`user_tables` helpers go through the raw Catalog directly to bypass that filter. attach::reset_search_path keeps the primary-name pin when a default persistent attachment is in place — RESET to "$single" would break unqualified resolution while persistent stays attached. Tests updated: - All `HyperMcpServer::new(path, ro, bare)` callsites lose their bare arg. - Two `--bare`-specific tests deleted (bare_server_does_not_create_catalog, is_bare_reflects_constructor_argument). - One detach-search-path test updated to expect the primary-name pin instead of "$single". - Test helpers writing to "the workspace" updated to write into the persistent attachment via fully-qualified SQL. - `table_exists` test helper goes through Catalog directly so it can see internal tables (which the new is_internal_table now filters out). Five tests still failing in saved_queries and table_catalog: - saved_queries: WorkspaceStore still targets primary; iteration 7. - table_catalog: ensure_exists/reconcile/upsert_stub still target primary; iteration 5 will route them to the persistent attachment.
Now that --bare is gone, AttachRegistry's seed_catalog_on_create flag is always true. Remove the field, the with_catalog_policy() constructor, and the conditional in attach() — seeding now depends only on whether MCP just created the file (file_was_created), matching the same uniform policy the engine uses for the default persistent attachment. The registry continues to hold *user-attached* databases only. The default persistent attachment is owned by Engine itself and isn't tracked in the registry; replay-on-reconnect re-issues only the user attaches. Tests: - on_missing_create_does_not_seed_under_bare_policy deleted (the bare policy no longer exists; the matching positive test for non-bare policy stays as the canonical create-+-seed regression).
The catalog tracks tables the user wants to keep around — i.e. tables in the persistent attachment. Ephemeral scratch tables aren't worth catalogging because the database is replaced every session. So: - ensure_exists / list / get / upsert_stub / set_metadata / delete_for / reconcile / refresh_row_count all qualify SQL with "persistent"."public"."_table_catalog". - When no persistent attachment is present (--ephemeral-only), all catalog operations no-op gracefully (Ok with empty/None) instead of erroring; set_metadata is the only path that surfaces a clear ReadOnlyViolation since the user's intent there is mutation. - user_tables / table_present / row_count_of probe persistent's pg_catalog.pg_tables directly via fully-qualified SQL. - Removed HYPERDB_NAMED_INTERNAL_TABLES from is_internal_table — the catalog never appears in describe_tables now (which only enumerates the connection's primary, i.e. ephemeral) so no filter is needed. Tests: catalog tests update their fixtures to seed user tables in "persistent"."public". (sed-driven batch update — every CREATE TABLE / INSERT INTO in those tests now qualifies the target db). attach_tests.copy_create_stubs_table_catalog_on_primary_workspace asserts catalog presence via persistent's pg_tables. Two saved_queries tests still failing pending iteration 7.
WorkspaceStore now writes _hyperdb_saved_queries into the persistent
attachment ("persistent"."public"."_hyperdb_saved_queries") instead
of the connection's primary database. This matches user expectations:
saved queries are reference material that should outlive a single
session, and the persistent attachment is where curated, long-lived
data lives in the new model.
build_store remains driven by 'persistent path was supplied' — the
behavior is unchanged from the user's perspective. --ephemeral-only
sessions still get SessionStore (in-memory, dies with the process).
Eight new tests in tests/two_db_model_tests.rs covering the core contracts of the new model: - Engine::new(Some(path)) attaches the file as 'persistent' and has_persistent() reports true. - Engine::new(None) produces an ephemeral-only engine; the 'persistent' alias is genuinely absent. - Each Engine gets a distinct ephemeral path even when multiple Engines coexist in the same PID (parallel test runners, embedded uses). - Persistent writes survive engine drop and are visible on recreate. - Ephemeral writes are discarded on drop (the entire point). - resolve_target_db routes None -> primary, 'persistent' -> persistent when present, errors with InvalidArgument when --ephemeral-only. - Engine::status() exposes both database paths and the has_persistent flag. ephemeral_only mode reports persistent_path = null.
README: - Operating Modes section now leads with the two-database concept and explains how to target either DB from SQL via fully-qualified names. - New 'Database storage' table documents --persistent-db default, --ephemeral-only, and the deprecated --workspace alias. - Saved-queries persistence note updated (queries now land in the persistent attachment automatically). - CLI Reference rewritten to match the actual flag set; --bare removed. - Examples migrated from --workspace to --persistent-db. DEVELOPMENT.md: - 'Workspace Modes Internals' replaced with 'Two-Database Engine Model' covering the per-engine ephemeral path naming, the schema_search_path pin to the primary, and the catalog/saved-queries routing helpers. CHANGELOG.md: - Added entries for the two-database engine model, platform-default persistent path, --persistent-db / --ephemeral-only flags, and the catalog / saved-queries persistence move. - Added 'Removed' section documenting --bare retirement.
table_catalog::table_present previously ran a fresh "persistent".pg_catalog.pg_tables probe on every catalog read/write (get, list, delete_for, set_metadata, upsert_stub via ensure_exists). For workloads that touch the catalog frequently — every ingest, every DDL — that's one round-trip per call after the catalog has clearly been created. Add a catalog_present_cache: Mutex<Option<bool>> on Engine. The cache: - Lives for the engine's lifetime, which is the right TTL: a ConnectionLost reconnect builds a fresh Engine, naturally resetting the cache. - Short-circuits to Ok(false) in --ephemeral-only mode without running the probe at all. - Gets primed to Some(true) by ensure_exists immediately after CREATE TABLE IF NOT EXISTS so the next catalog op skips the probe. - Returns the cached value on every subsequent call until the engine is dropped. Two new tests in two_db_model_tests verify the cache: - catalog_presence_probe_is_cached: probe runs once, mark_catalog_present flips to true without running the probe again. - catalog_presence_short_circuits_in_ephemeral_only: probe never runs when has_persistent() is false.
When the daemon restarts hyperd, every connection in a watcher's pool becomes invalid. Previously the watcher would route every subsequent file to 'failed/' until the user noticed and re-issued watch_directory. Now: - The watcher's pool lives behind Arc<RwLock<Arc<Pool>>> so it can be swapped atomically. - Each per-file ingest reads the current pool from the slot, calls ingest_one_ready_file, and inspects the result. - If the result is a connection-lost error (per is_connection_lost), rebuild_watcher_pool builds a fresh Pool from the engine's *current* endpoint and the ingest retries exactly once on the new pool. - Persistent failures (the retry also fails) fall through to the existing 'failed/' move logic so a permanently-broken file doesn't keep the watcher pinned in retry loops. - The initial-sweep path uses the same recovery wrapper so a watcher registered just after a hyperd hiccup still ingests the backlog. Internal refactor: - Pure ingest path extracted into ingest_one_ready_file (returns Result<u64, McpError> with no file-system side effects). - process_ready_with_recovery wraps it: handles symlink rejection, the retry-on-connection-lost loop, and the success/failure file moves. The old process_ready_async is gone in favor of these two. The DEVELOPMENT.md known-limitation entry for watchers is updated to reflect the new behavior; CHANGELOG gets a Fixed entry.
This was referenced May 25, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reshapes hyperdb-mcp around an ephemeral-primary, persistent-attached session model. Every session now has both:
$TMPDIR/hyperdb-mcp-<pid>-<seq>/scratch.hyper— created fresh per-engine, deleted on Drop. The connection is bound here; unqualified SQL routes here. This is the LLM's scratch space."persistent"— at the platform-default location (~/Library/Application Support/hyperdb/workspace.hyperon macOS,~/.local/share/hyperdb/workspace.hyperon Linux,%APPDATA%\hyperdb\workspace.hyperon Windows) or wherever--persistent-db <PATH>points. Survives across sessions.CLI changes
--persistent-db <PATH>: replaces--workspace. Defaults to the platform data dir. Override viaHYPERDB_PERSISTENT_DBenv var.--ephemeral-only: skip persistent attachment entirely. Saved queries fall back to in-memory storage.--workspace <PATH>: still accepted (hidden in--help) with a stderr warning. Will be removed.--bare. Catalog seeding is now uniform: created when MCP creates a fresh.hyper, never touched on existing files. Users wanting a pristine.hyperfor export canDROP TABLE _table_catalogonce after creation.Targeting either database from SQL
Tool calls default to the ephemeral primary. To reach the persistent attachment, use fully-qualified table references:
_table_catalogand_hyperdb_saved_queriesnow live in the persistent attachment automatically — no flag toggling, no manual migration.Architecture
Test results
All 442 tests pass across the workspace (parallel + sequential, hyperdb-mcp + hyperdb-api + sea-query-hyperdb + bootstrap).
10 new tests in
tests/two_db_model_tests.rscover the core contracts: engine attachment shape, ephemeral-only mode, ephemeral path uniqueness across engines, persistent writes surviving recreate, ephemeral writes vanishing on drop,resolve_target_dbrouting, status JSON shape, and the per-engine catalog-presence cache.Iteration breakdown (9 commits)
b5def2bpathsmodule,dirsdep)c2a7046--persistent-db,--ephemeral-only,--workspacedeprecation,--bareremovalb552252seed_catalog_on_createfrom AttachRegistry (always seed on create now)532133e_table_catalogrouted to persistent via fully-qualified SQL;pg_catalog.pg_tablesprobes for presencead61b27WorkspaceStore) routed to persistent964291bcbd3635a6d31f7_table_catalogpresence cache — primed byensure_exists, reused by every catalog op for the engine's lifetimefd21d53Arc<RwLock<Arc<Pool>>>; per-file ingest rebuilds the pool and retries once on connection-lost errorsDeferred
A per-tool
databaseparameter onquery/execute/describe/etc. was originally part of this work (Iteration 6 in the plan) but is deferred to a follow-up PR. The current state lets the LLM target either database via fully-qualified SQL — that covers the most common cases. The per-tool parameter (and apersist: trueflag onload_data/load_file) is roughly 200 LOC of plumbing across ingest paths and 24 tool handlers; it deserves its own focused PR.Test plan
cargo test --workspace— 442/442 passingcargo clippy -p hyperdb-mcp --tests— cleancargo fmt --check— clean--helpshows new flags;--workspaceemits deprecation warning;--persistent-db <PATH>overrides default;--ephemeral-onlyerrors when combined with--persistent-db.save_queryfrom session A is visible to session B (different MCP client) using the same persistent file.Migration notes for users
--bareusers: drop the flag. If you want a clean.hyperwithout_table_catalog, run once normally, thenDROP TABLE "persistent"."public"."_table_catalog". Subsequent opens won't recreate it.--workspaceusers: rename to--persistent-db. Old flag still works with a warning.load_data/load_fileending up in the--workspacefile: that data now lands in ephemeral by default. The next iteration adds apersist: trueflag for ingest tools; until then, usequery/executewith fully-qualified SQL likeINSERT INTO "persistent"."public"."x" SELECT * FROM x.