Skip to content

feat(mcp): ephemeral-primary + persistent-attached two-database model#29

Merged
StefanSteiner merged 9 commits into
tableau:mainfrom
StefanSteiner:ssteiner/mcp-simplify-persistent
May 25, 2026
Merged

feat(mcp): ephemeral-primary + persistent-attached two-database model#29
StefanSteiner merged 9 commits into
tableau:mainfrom
StefanSteiner:ssteiner/mcp-simplify-persistent

Conversation

@StefanSteiner
Copy link
Copy Markdown
Contributor

Summary

Reshapes hyperdb-mcp around an ephemeral-primary, persistent-attached session model. Every session now has both:

  • An ephemeral primary at $TMPDIR/hyperdb-mcp-<pid>-<seq>/scratch.hyper — created fresh per-engine, deleted on Drop. The connection is bound here; unqualified SQL routes here. This is the LLM's scratch space.
  • A persistent attachment under the alias "persistent" — at the platform-default location (~/Library/Application Support/hyperdb/workspace.hyper on macOS, ~/.local/share/hyperdb/workspace.hyper on Linux, %APPDATA%\hyperdb\workspace.hyper on Windows) or wherever --persistent-db <PATH> points. Survives across sessions.

CLI changes

  • New --persistent-db <PATH>: replaces --workspace. Defaults to the platform data dir. Override via HYPERDB_PERSISTENT_DB env var.
  • New --ephemeral-only: skip persistent attachment entirely. Saved queries fall back to in-memory storage.
  • Deprecated --workspace <PATH>: still accepted (hidden in --help) with a stderr warning. Will be removed.
  • Removed --bare. Catalog seeding is now uniform: created when MCP creates a fresh .hyper, never touched on existing files. Users wanting a pristine .hyper for export can DROP TABLE _table_catalog once after creation.

Targeting either database from SQL

Tool calls default to the ephemeral primary. To reach the persistent attachment, use fully-qualified table references:

-- Read from persistent
SELECT * FROM "persistent"."public"."customers";

-- Write to persistent
CREATE TABLE "persistent"."public"."revenue_2026" AS
  SELECT region, SUM(amount) FROM scratch_orders GROUP BY region;

_table_catalog and _hyperdb_saved_queries now live in the persistent attachment automatically — no flag toggling, no manual migration.

Architecture

┌─────────────────── Engine ────────────────────┐
│                                               │
│ Connection ─────────────────┐                 │
│                             ▼                 │
│              ┌─ ephemeral primary ──┐         │
│              │  $TMPDIR/.../scratch.hyper     │
│              │  unqualified SQL routes here   │
│              └────────────────────────────────┘│
│                                               │
│              ┌─ ATTACH persistent ───┐        │
│              │  ~/Library/.../        │        │
│              │     workspace.hyper    │        │
│              │  qualified SQL: "persistent"... │
│              │  _table_catalog,                │
│              │  _hyperdb_saved_queries here    │
│              └─────────────────────────────────┘│
└───────────────────────────────────────────────┘

Test results

All 442 tests pass across the workspace (parallel + sequential, hyperdb-mcp + hyperdb-api + sea-query-hyperdb + bootstrap).

10 new tests in tests/two_db_model_tests.rs cover the core contracts: engine attachment shape, ephemeral-only mode, ephemeral path uniqueness across engines, persistent writes surviving recreate, ephemeral writes vanishing on drop, resolve_target_db routing, status JSON shape, and the per-engine catalog-presence cache.

Iteration breakdown (9 commits)

Commit Scope
b5def2b Two-database engine model + platform-default persistent path resolver (paths module, dirs dep)
c2a7046 CLI surface: --persistent-db, --ephemeral-only, --workspace deprecation, --bare removal
b552252 Drop seed_catalog_on_create from AttachRegistry (always seed on create now)
532133e _table_catalog routed to persistent via fully-qualified SQL; pg_catalog.pg_tables probes for presence
ad61b27 Saved queries (WorkspaceStore) routed to persistent
964291b Tests for the two-database model
cbd3635 README / DEVELOPMENT.md / CHANGELOG documentation
a6d31f7 Per-engine _table_catalog presence cache — primed by ensure_exists, reused by every catalog op for the engine's lifetime
fd21d53 Watcher recovery after hyperd restart — pool now lives behind Arc<RwLock<Arc<Pool>>>; per-file ingest rebuilds the pool and retries once on connection-lost errors

Deferred

A per-tool database parameter on query/execute/describe/etc. was originally part of this work (Iteration 6 in the plan) but is deferred to a follow-up PR. The current state lets the LLM target either database via fully-qualified SQL — that covers the most common cases. The per-tool parameter (and a persist: true flag on load_data/load_file) is roughly 200 LOC of plumbing across ingest paths and 24 tool handlers; it deserves its own focused PR.

Test plan

  • cargo test --workspace — 442/442 passing
  • cargo clippy -p hyperdb-mcp --tests — clean
  • cargo fmt --check — clean
  • CLI smoke test: --help shows new flags; --workspace emits deprecation warning; --persistent-db <PATH> overrides default; --ephemeral-only errors when combined with --persistent-db.
  • Manual: persistent file at platform default location is created on first run and reused across runs.
  • Manual: save_query from session A is visible to session B (different MCP client) using the same persistent file.

Migration notes for users

  • --bare users: drop the flag. If you want a clean .hyper without _table_catalog, run once normally, then DROP TABLE "persistent"."public"."_table_catalog". Subsequent opens won't recreate it.
  • --workspace users: rename to --persistent-db. Old flag still works with a warning.
  • Users who relied on data written via load_data/load_file ending up in the --workspace file: that data now lands in ephemeral by default. The next iteration adds a persist: true flag for ingest tools; until then, use query/execute with fully-qualified SQL like INSERT INTO "persistent"."public"."x" SELECT * FROM x.

…t path

Begins migrating hyperdb-mcp toward an ephemeral-primary, persistent-attached
session model. Each engine now holds two .hyper files at all times (when not
in --ephemeral-only mode):

- Ephemeral primary: $TMPDIR/hyperdb-mcp-<pid>-<seq>/scratch.hyper
  Created fresh per-engine, deleted on Drop. The connection is bound here;
  unqualified SQL routes here.
- Persistent attachment: at the platform-default location (or user-supplied
  path), attached under the reserved alias "persistent" during Engine::new.
  Survives across sessions.

Per-engine sequence number lets multiple Engines coexist in the same PID
(parallel test runners, restart-after-ConnectionLost) without colliding.

New module:
- paths: cross-platform resolution for the persistent-db default location
  using `dirs::data_dir()`. Override via HYPERDB_PERSISTENT_DB env var.

Engine API changes:
- workspace_path() -> ephemeral_path() and new persistent_path()
- is_persistent() -> has_persistent() + persistent_was_just_created()
- new resolve_target_db() resolves an optional database alias for tools
- PERSISTENT_ALIAS = "persistent" const exposed
- Drop always cleans up ephemeral; persistent stays where the user put it

attach::reset_search_path now respects the always-on persistent attachment:
re-pins to the primary's name instead of issuing RESET (which would leave
the persistent attachment without a working unqualified-name resolver).

Tests are all passing except for two clusters being addressed in later
iterations:
- saved_queries_tests (2 failures): WorkspaceStore still targets primary;
  iteration 7 will move it to the persistent attachment.
- table_catalog_tests (2 failures): catalog ensure_exists still targets
  primary; iteration 5 will make it presence-conditional and per-DB.

This is the second commit in a multi-step migration; --bare and --workspace
CLI flags still work (deprecation comes in iteration 3).
CLI changes:
- New `--persistent-db <PATH>`: replaces `--workspace`. Defaults to the
  platform data dir (or HYPERDB_PERSISTENT_DB env var). The deprecated
  `--workspace` is hidden but still accepted with a stderr warning;
  passing both is an error.
- New `--ephemeral-only`: skip persistent attachment entirely. Saved
  queries fall back to in-memory storage.
- Removed `--bare`. Catalog creation is now uniform (always seed when
  MCP creates a fresh .hyper file; never touch an existing file's
  catalog), so the opt-out flag became redundant. Users wanting a
  pristine .hyper for export can DROP TABLE _table_catalog after
  creation; subsequent opens won't recreate it.

HyperMcpServer:
- new() and with_no_daemon() drop the `bare` parameter.
- is_bare() removed.
- Saved-query store selection becomes "persistent if available, session
  otherwise" — same logic, but driven by --ephemeral-only instead of
  --bare.
- AttachRegistry no longer takes a catalog policy; `seed_catalog_on_create`
  becomes the constant default.

`_table_catalog` is now treated as an internal table by `is_internal_table`
so it doesn't appear in user-visible `describe_tables` output or
`total_rows`. The catalog's own `table_present`/`user_tables` helpers go
through the raw Catalog directly to bypass that filter.

attach::reset_search_path keeps the primary-name pin when a default
persistent attachment is in place — RESET to "$single" would break
unqualified resolution while persistent stays attached.

Tests updated:
- All `HyperMcpServer::new(path, ro, bare)` callsites lose their bare arg.
- Two `--bare`-specific tests deleted (bare_server_does_not_create_catalog,
  is_bare_reflects_constructor_argument).
- One detach-search-path test updated to expect the primary-name pin
  instead of "$single".
- Test helpers writing to "the workspace" updated to write into the
  persistent attachment via fully-qualified SQL.
- `table_exists` test helper goes through Catalog directly so it can see
  internal tables (which the new is_internal_table now filters out).

Five tests still failing in saved_queries and table_catalog:
- saved_queries: WorkspaceStore still targets primary; iteration 7.
- table_catalog: ensure_exists/reconcile/upsert_stub still target primary;
  iteration 5 will route them to the persistent attachment.
Now that --bare is gone, AttachRegistry's seed_catalog_on_create flag is
always true. Remove the field, the with_catalog_policy() constructor,
and the conditional in attach() — seeding now depends only on whether
MCP just created the file (file_was_created), matching the same uniform
policy the engine uses for the default persistent attachment.

The registry continues to hold *user-attached* databases only. The
default persistent attachment is owned by Engine itself and isn't
tracked in the registry; replay-on-reconnect re-issues only the user
attaches.

Tests:
- on_missing_create_does_not_seed_under_bare_policy deleted (the bare
  policy no longer exists; the matching positive test for non-bare
  policy stays as the canonical create-+-seed regression).
The catalog tracks tables the user wants to keep around — i.e. tables
in the persistent attachment. Ephemeral scratch tables aren't worth
catalogging because the database is replaced every session. So:

- ensure_exists / list / get / upsert_stub / set_metadata / delete_for /
  reconcile / refresh_row_count all qualify SQL with
  "persistent"."public"."_table_catalog".
- When no persistent attachment is present (--ephemeral-only), all
  catalog operations no-op gracefully (Ok with empty/None) instead of
  erroring; set_metadata is the only path that surfaces a clear
  ReadOnlyViolation since the user's intent there is mutation.
- user_tables / table_present / row_count_of probe persistent's
  pg_catalog.pg_tables directly via fully-qualified SQL.
- Removed HYPERDB_NAMED_INTERNAL_TABLES from is_internal_table — the
  catalog never appears in describe_tables now (which only enumerates
  the connection's primary, i.e. ephemeral) so no filter is needed.

Tests: catalog tests update their fixtures to seed user tables in
"persistent"."public". (sed-driven batch update — every CREATE
TABLE / INSERT INTO in those tests now qualifies the target db).
attach_tests.copy_create_stubs_table_catalog_on_primary_workspace
asserts catalog presence via persistent's pg_tables.

Two saved_queries tests still failing pending iteration 7.
WorkspaceStore now writes _hyperdb_saved_queries into the persistent
attachment ("persistent"."public"."_hyperdb_saved_queries") instead
of the connection's primary database. This matches user expectations:
saved queries are reference material that should outlive a single
session, and the persistent attachment is where curated, long-lived
data lives in the new model.

build_store remains driven by 'persistent path was supplied' — the
behavior is unchanged from the user's perspective. --ephemeral-only
sessions still get SessionStore (in-memory, dies with the process).
Eight new tests in tests/two_db_model_tests.rs covering the core
contracts of the new model:

- Engine::new(Some(path)) attaches the file as 'persistent' and
  has_persistent() reports true.
- Engine::new(None) produces an ephemeral-only engine; the 'persistent'
  alias is genuinely absent.
- Each Engine gets a distinct ephemeral path even when multiple Engines
  coexist in the same PID (parallel test runners, embedded uses).
- Persistent writes survive engine drop and are visible on recreate.
- Ephemeral writes are discarded on drop (the entire point).
- resolve_target_db routes None -> primary, 'persistent' -> persistent
  when present, errors with InvalidArgument when --ephemeral-only.
- Engine::status() exposes both database paths and the has_persistent
  flag. ephemeral_only mode reports persistent_path = null.
README:
- Operating Modes section now leads with the two-database concept and
  explains how to target either DB from SQL via fully-qualified names.
- New 'Database storage' table documents --persistent-db default,
  --ephemeral-only, and the deprecated --workspace alias.
- Saved-queries persistence note updated (queries now land in the
  persistent attachment automatically).
- CLI Reference rewritten to match the actual flag set; --bare removed.
- Examples migrated from --workspace to --persistent-db.

DEVELOPMENT.md:
- 'Workspace Modes Internals' replaced with 'Two-Database Engine
  Model' covering the per-engine ephemeral path naming, the
  schema_search_path pin to the primary, and the catalog/saved-queries
  routing helpers.

CHANGELOG.md:
- Added entries for the two-database engine model, platform-default
  persistent path, --persistent-db / --ephemeral-only flags, and the
  catalog / saved-queries persistence move.
- Added 'Removed' section documenting --bare retirement.
table_catalog::table_present previously ran a fresh
"persistent".pg_catalog.pg_tables probe on every catalog read/write
(get, list, delete_for, set_metadata, upsert_stub via ensure_exists).
For workloads that touch the catalog frequently — every ingest, every
DDL — that's one round-trip per call after the catalog has clearly
been created.

Add a catalog_present_cache: Mutex<Option<bool>> on Engine. The cache:

- Lives for the engine's lifetime, which is the right TTL: a
  ConnectionLost reconnect builds a fresh Engine, naturally resetting
  the cache.
- Short-circuits to Ok(false) in --ephemeral-only mode without
  running the probe at all.
- Gets primed to Some(true) by ensure_exists immediately after CREATE
  TABLE IF NOT EXISTS so the next catalog op skips the probe.
- Returns the cached value on every subsequent call until the engine
  is dropped.

Two new tests in two_db_model_tests verify the cache:
- catalog_presence_probe_is_cached: probe runs once, mark_catalog_present
  flips to true without running the probe again.
- catalog_presence_short_circuits_in_ephemeral_only: probe never runs
  when has_persistent() is false.
When the daemon restarts hyperd, every connection in a watcher's pool
becomes invalid. Previously the watcher would route every subsequent
file to 'failed/' until the user noticed and re-issued
watch_directory.

Now:
- The watcher's pool lives behind Arc<RwLock<Arc<Pool>>> so it can be
  swapped atomically.
- Each per-file ingest reads the current pool from the slot, calls
  ingest_one_ready_file, and inspects the result.
- If the result is a connection-lost error (per is_connection_lost),
  rebuild_watcher_pool builds a fresh Pool from the engine's *current*
  endpoint and the ingest retries exactly once on the new pool.
- Persistent failures (the retry also fails) fall through to the
  existing 'failed/' move logic so a permanently-broken file doesn't
  keep the watcher pinned in retry loops.
- The initial-sweep path uses the same recovery wrapper so a watcher
  registered just after a hyperd hiccup still ingests the backlog.

Internal refactor:
- Pure ingest path extracted into ingest_one_ready_file (returns
  Result<u64, McpError> with no file-system side effects).
- process_ready_with_recovery wraps it: handles symlink rejection,
  the retry-on-connection-lost loop, and the success/failure file
  moves. The old process_ready_async is gone in favor of these two.

The DEVELOPMENT.md known-limitation entry for watchers is updated to
reflect the new behavior; CHANGELOG gets a Fixed entry.
@StefanSteiner StefanSteiner merged commit 025ffa7 into tableau:main May 25, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant