Skip to content

mcp-data-platform-v1.67.1

Choose a tag to compare

@github-actions github-actions released this 25 May 22:58
· 80 commits to main since this release
443fae7

Highlights

  • Closes #479: apigateway embed-job doom loop. Large OpenAPI specs on CPU-only embedders no longer get stuck in an invisible retry loop. The embed worker now heartbeats its lease, persists each chunk as it completes, and the portal distinguishes a retry from a fresh queue.
  • UI: connection editor lands on the saved connection (#469), the connection-side companion to the catalog landing fix in #458.

What changed

apigateway: heartbeat, per-chunk persistence, configurable embed-job knobs (#480)

The embed-job worker previously held a fixed 10-minute lease, computed every chunk into memory, then persisted the whole spec in one final atomic write at the end. On CPU-only embedders, a ~150-operation spec ran about 10m30s per attempt, hit the reaper's lease ceiling at the last chunk, threw away every prior chunk's work, and retried from scratch. The portal showed queued through every retry with no surface for the failure.

This release closes that path with three coordinated changes:

1. Heartbeat keeps the lease alive while Compute progresses.
A goroutine renews lease_expires_at every lease_duration / 3 while the embed pass is running. A slow batch on CPU Ollama no longer looks abandoned to the reaper at the 10-minute mark. The earlier worker-side context ceiling that defeated this (canceling Compute at lease_duration + 30s regardless of lease state) has been removed; a 1-hour processSafetyBound exists only as a wall-clock backstop against a Compute that hangs without forward progress.

2. Per-chunk persistence preserves progress across retries.
A new additive UpsertOperationEmbeddingsBatch (INSERT ... ON CONFLICT DO UPDATE) writes each chunk's vectors to api_catalog_operation_embeddings immediately. A job that fails on chunk N leaves chunks 0..N-1 visible to the next attempt's dedup pass, which skips the upstream re-embed for those operations. The final atomic UpsertOperationEmbeddings at job completion still runs as the canonical full replacement so operations removed from the spec are cleaned up.

3. Operator-visible retry state.
A pending job with attempts > 0 now renders in the catalog status badge as retrying (N tries) with the upstream error in the tooltip, instead of indistinguishable queued. A doom-looping job is visible at a glance.

New configuration knobs

Under apigateway.embed_jobs:

apigateway:
  embed_jobs:
    workers: 1            # goroutines per pod sharing the queue
    embed_timeout: 5m     # per-batch HTTP call timeout
    lease_duration: 10m   # claim window the heartbeat re-stamps at lease_duration/3
    batch_size: 32        # texts per upstream EmbedBatch call
Field Default Effect
embed_timeout 5m Per-batch HTTP timeout against the embedder.
lease_duration 10m DB lease window; heartbeat re-stamps at lease_duration / 3.
batch_size 32 Texts per upstream EmbedBatch call.
workers 1 Goroutines per pod. CPU embedders saturate at 1; GPU embedders benefit from 2-4.

Startup logs a warning when embed_timeout >= lease_duration.

Operator notes for CPU-only embedders

The defaults are tuned for a GPU embedder. CPU-only Ollama on large specs (~120+ operations) should raise both timing knobs, e.g.:

apigateway:
  embed_jobs:
    embed_timeout: 15m
    lease_duration: 20m

No database migrations. Existing configurations work as-is.

UI: connection editor lands on the saved connection (#469)

After saving a new connection, the panel previously snapped to the first listed connection instead of the one just saved. ConnectionEditor.onSave now passes (kind, name) so the panel sets selectedKey to the freshly-saved connection, and the auto-correct effect gates on !isFetching so the new selection survives the post-save refetch window. Same class of bug as the catalog landing fix in #458.

Dependencies

  • github.com/pgvector/pgvector-go 0.3.0 → 0.4.0 (#471)
  • golangci/golangci-lint-action 9.2.0 → 9.2.1 (#470)

Upgrade

  • No database migrations.
  • Existing configmaps continue to work; the new apigateway.embed_jobs.batch_size and lease_duration fields default to the prior hardcoded values (32 and 10m).
  • Recommended for CPU-only embedders processing large specs: raise embed_timeout and lease_duration per the operator notes above.

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.67.1

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.67.1_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.67.1_linux_amd64.tar.gz

Full changelog

  • fix(apigateway): heartbeat, per-chunk persistence, configurable embed-job knobs (#479, #480) — @cjimti
  • fix(ui): land on saved connection instead of first listed (#469) — @cjimti
  • deps: bump github.com/pgvector/pgvector-go from 0.3.0 to 0.4.0 (#471) — @dependabot
  • ci: bump golangci/golangci-lint-action from 9.2.0 to 9.2.1 (#470) — @dependabot

Full diff: v1.67.0...v1.67.1