Skip to content

mcp-data-platform-v1.64.1

Choose a tag to compare

@github-actions github-actions released this 19 May 23:45
· 95 commits to main since this release
c91e711

Highlights

Hotfix for the api-gateway embed-jobs worker timing out on every spec write against CPU-only Ollama. Spec saves on v1.64.0 with a 26+ operation spec ended in the badge cycling between indexing and failed indefinitely; this release stops that loop and adds the regression test that would have caught it before v1.64.0.

What changed

v1.64.0 introduced the batched /api/embed endpoint for embedding requests but kept the 30-second HTTP timeout that was tuned for the older singular /api/embeddings path. A batched POST of 26+ texts on CPU-only Ollama routinely exceeds 30 seconds, so the worker timed out on every attempt, retried five times, failed terminally, then the reconciler re-enqueued the job because operation count and embedding count still disagreed. The cycle never converged.

The fix is scoped to the asynchronous worker. Request-path embedding callers (memory_recall, memory_manage, capture_insight, apigateway queryVectorFor) continue to use the 30s default because their singular calls return in 1-3 seconds on CPU Ollama and operators do not want a wedged Ollama to hold an MCP tools/call open for minutes. Only the embed-jobs worker gets the longer timeout, scoped via a new config knob.

Operator-visible changes

New configuration

apigateway:
  embed_jobs:
    workers: 1
    embed_timeout: 5m
Field Default Affects
embed_jobs.embed_timeout 5m The api-gateway embed-jobs worker's batched /api/embed POSTs against Ollama. Request-path embedding calls remain governed by memory.embedding.ollama.timeout (default 30s).

GPU-backed Ollama deployments can tighten this value to keep the worker's failure floor short; CPU-only deployments should leave it at the default.

Behavior preserved

  • No wire-format changes.
  • No database migrations.
  • No public API changes.
  • No config breakage: deployments that did not set the new key get the documented 5m default automatically.

Upgrade notes

  • No operator action required beyond rolling the pod. The new default behaves correctly for both CPU and GPU Ollama deployments. Operators who already worked around the bug on v1.64.0 by setting memory.embedding.ollama.timeout: 5m can leave that override in place or remove it; the worker no longer depends on it.
  • Tightening the worker timeout on GPU embedders is now possible. Operators on fast embedders who want a short failure floor on the worker (e.g. 30 seconds) can set apigateway.embed_jobs.embed_timeout: 30s without affecting other consumers.

Detailed changes

  • #445 / #446. Scope the long batch timeout to the embed-jobs worker. embedding.DefaultTimeout stays at 30s. New apigateway.embed_jobs.embed_timeout config (default 5m). New Platform.workerEmbedder() helper constructs a dedicated Ollama provider with that timeout; the shared p.embeddingProv used by request-path callers is unchanged. Three new unit tests cover the scoping including a regression-prevention assertion that the shared instance keeps its 30s default when the worker timeout is configured.

    New internal/testollama helper and pkg/platform/integration_embedjobs_realollama_test.go (build tag integration) drive the production embedding path end-to-end against a real Ollama container running nomic-embed-text. Pre-v1.64.1 behavior would fail this test on the same batch shape that caused the incident; post-fix it passes with margin. This pattern is reusable by upcoming embedding consumers (DataHub semantic search, prompt discovery, knowledge insights recall, portal asset search) so synthetic-delay stubs are no longer the only coverage available.

Workaround on v1.64.0 (no upgrade required)

Add timeout: 5m under memory.embedding.ollama in the platform configmap and roll the deployment.

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.64.1

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.64.1_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.64.1_linux_amd64.tar.gz

Full changelog

v1.64.0...v1.64.1