mcp-data-platform-v1.71.0
Overview
This release generalizes the api-catalog embedding queue into a reusable, source-kind-agnostic indexing-job framework (pkg/indexjobs) and migrates the api-catalog toolkit to consume it as the first client. It is an internal infrastructure change: there are no new tools, no admin-API changes, and no configuration changes for users or operators. It does carry a database migration (see Upgrade notes below).
The framework is the foundation for upcoming semantic-search consumers (tool discovery, prompt library, knowledge-insight recall, portal asset search), each of which will plug in as a small Source + Sink rather than forking the queue.
Upgrade notes
This release applies two migrations on startup:
000051_index_jobsadds the sharedindex_jobsqueue table, keyed on an opaque(source_kind, source_id)pair.000052_drop_api_catalog_embedding_jobsremoves the old per-toolkitapi_catalog_embedding_jobstable.
No operator action is required. The dropped table held only transient queue rows. On first boot after the upgrade, the reconciler compares each spec's operation_count against its persisted vector count and re-enqueues any gaps; the worker re-converges them through the new queue. During that brief reconverge window, api_list_endpoints semantic/hybrid ranking falls back to lexical for any not-yet-reindexed spec, exactly as it does for a freshly added spec.
The api-catalog vector table (api_catalog_operation_embeddings) is untouched and keeps its ON DELETE CASCADE to api_catalog_specs, so no embedding data is recomputed or moved and spec deletion still cascades to its vectors.
Rollback is supported: the 000052 down migration recreates api_catalog_embedding_jobs.
What changed
New: pkg/indexjobs framework
A Postgres-backed job queue generic over (source_kind, source_id). The queue mechanics are the proven pattern from the api-catalog queue: lease-based claim with FOR UPDATE SKIP LOCKED, exponential-backoff retry, a reaper that releases expired leases, LISTEN/NOTIFY low-latency wake-ups, and a periodic gap reconciler. Consumers implement two small contracts:
Sourcedeclares what text to embed for asource_idand an optional post-embed hook.Sinkdeclares where vectors live and how to detect gaps for that kind.
The framework owns everything in between: SHA-256 text-hash dedup, batched embedding-provider calls, chunk-boundary progress, incremental persistence, and the full claim/lease/retry/reaper/reconcile state machine. One worker pool, one reaper, and one reconciler serve every registered kind, routing by the source_kind on each job row.
api-catalog migrated to the framework
The api-catalog toolkit is the first consumer (pkg/toolkits/apigateway/catalogindex). Its Sink writes the existing api_catalog_operation_embeddings table; an AdminStore backs the admin handler from index_jobs joined to the api-catalog tables.
- The admin endpoints (
/api/v1/admin/api-catalogs/{id}/embedding-status,embedding-health,embedding-jobs) keep identical URLs and JSON shapes. - The
apigateway.embed_jobs.*configuration keys (workers,batch_size,lease_duration,embed_timeout) are unchanged.
Removed
The pkg/toolkits/apigateway/embedjobs package is removed; its behavior is now provided by pkg/indexjobs plus the api-catalog Source/Sink.
Compatibility
- MCP tools: unchanged.
- Admin REST API: unchanged (URLs and response shapes).
- Configuration: unchanged.
- Database: two additive/cleanup migrations applied automatically; rollback supported.
Commits
- 3ec5a1a: feat(indexjobs): reusable indexing-job framework + api-catalog migration (#438) (#503) (@cjimti)
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformClaude Code CLI
claude mcp add mcp-data-platform -- mcp-data-platformDocker
docker pull ghcr.io/txn2/mcp-data-platform:v1.71.0Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-data-platform_1.71.0_linux_amd64.tar.gz.sigstore.json \
mcp-data-platform_1.71.0_linux_amd64.tar.gz