Skip to content

mcp-data-platform-v1.75.1

Choose a tag to compare

@github-actions github-actions released this 31 May 08:36
· 50 commits to main since this release
3740ca7

Patch: tools index stops re-syncing forever

A bug-fix release that resolves a long-standing churn in the tools semantic index. No migration, no configuration change, no API change; safe to roll forward.

What was wrong

The tools index consumer could never reach a settled "in sync" state. Its FindGaps was a stub that unconditionally reported the tool corpus as out of sync, so the reconciler enqueued a tools index job every 5 minutes, on every replica, forever, regardless of whether anything had changed. The vectors themselves were complete and stable; the indexer simply had no concept of "in sync," so it kept re-checking indefinitely.

This was not data loss and not re-embedding: the worker's text-hash dedup skips the embedding provider for unchanged descriptors, so each pass completed in about 0.1s. The cost was perpetual no-op churn, a steady stream of identical job rows that never stopped, unbounded index_jobs growth, and a dashboard that showed tools as "Healthy, last synced 2m ago" indefinitely instead of settling the way a real gap-checked kind (api_catalog) does.

The fix

tools FindGaps now performs a real content diff instead of a constant. On each reconcile sweep it compares the live tool registry against the persisted vectors by descriptor text hash, and returns the corpus only when something has actually drifted (a tool added, removed, its description overridden, or its visibility flipped), and nothing when the index is in sync. A steady-state corpus produces no jobs, so the index settles and index_jobs row growth from the tools heartbeat stops.

A content hash (not a count comparison) is used deliberately: the tool set lives in the running process, so a description edit or a visibility flip changes the live descriptor without changing the stored vector count. The diff hashes the same descriptor text the worker embeds, so it catches exactly the drift a count check would miss. Model or dimension changes (an embedding-provider swap) remain a restart-time concern, handled by the boot re-index against the new provider, not the per-sweep reconciler.

Behavior change to note

A kind-wide "Re-index" of an already-in-sync tools corpus now enqueues nothing and reports count: 0. This matches api_catalog and the documented /admin/index-jobs/reindex contract ("re-enqueue every out-of-sync unit"); tools was previously the only kind that always re-enqueued, as a side effect of the stub. Per-unit re-index (with an explicit source_id) is unchanged and still forces a full re-embed.

Upgrade notes

  • No migration and no configuration change. Roll the new build out and the tools reconciler stops enqueuing no-op jobs once the corpus is in sync; a real tool change still converges within one reconcile interval.
  • Existing tools vectors and persisted text hashes are unaffected (the shared hash is byte-identical to the previous implementation).

Changelog

  • fix(toolsindex): real content-diff FindGaps so tools stops re-syncing forever (#511) (#512) (@cjimti)

Full changelog: v1.75.0...v1.75.1

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.75.1

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.75.1_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.75.1_linux_amd64.tar.gz