Skip to content

mcp-data-platform-v1.66.0

Choose a tag to compare

@github-actions github-actions released this 20 May 08:12
· 92 commits to main since this release
fc38bb7

This release ships two changes: observability and shutdown hardening so freshly-deployed pods are immediately profileable and drain cleanly on SIGTERM, and an OpenAPI 3.1 resolver fix so api_get_endpoint_schema no longer drops oneOf / anyOf / allOf branches, response headers blocks, or const values.

Highlights

Metrics enabled by default (#449)

OTEL_METRICS_ENABLED now defaults to true. A freshly-deployed pod exposes Prometheus metrics on :9090/metrics without an env flip. The metrics listener is on a dedicated port isolated from the MCP and admin paths; restrict access with a NetworkPolicy. Set OTEL_METRICS_ENABLED=false to opt out.

Per-tool latency, apigateway upstream latency, and tool-call counters are now scrapable out of the box during sizing exercises and incidents.

Idempotent meter-provider shutdown (#449)

The OTel SDK MeterProvider.Shutdown returns reader is shutdown on the second call. With the new default surfacing a real provider in tests, Platform.Close invoked twice would error. Metrics.Shutdown is now guarded by sync.Once with a cached error, and the call goes through an injectable shutdownFn so the error path is testable without forcing the SDK to misbehave.

Graceful shutdown wiring (#449)

cmd/mcp-data-platform/main.go was calling platform.Close() but never platform.Stop(). Every lifecycle.OnStop callback registered by the embed-jobs subsystem (worker, reaper, reconciler, LISTEN/NOTIFY listener) was being abandoned mid-flight on SIGTERM, leaving worker goroutines running while the DB pool closed underneath them.

Fixes:

  • closeServer now calls platform.Stop(ctx) with a 10s bounded context before platform.Close().
  • The embed-jobs OnStop closure is extracted into Platform.stopAPIGatewayEmbedJobs and runs each Stop call inside a new boundedStop helper that races the work against ctx.Done. A hung component cannot stall past the bounded deadline. Abandoned jobs are safe because their PostgreSQL leases expire and another replica reclaims them.

api_get_endpoint_schema surfaces full OpenAPI 3.1 contract (#450)

The schema flattener walked only properties and items, silently dropping three patterns that are canonical in OpenAPI 3.1:

  • oneOf: [$ref, {type: "null"}] (the 3.1 replacement for the 3.0 nullable: true keyword) collapsed to an empty object. Both the inlined $ref and the null branch disappeared. Same for anyOf and allOf.
  • Response headers blocks were stripped. Callers had no signal about Location on a 301, Retry-After on a 429, ETag / Last-Modified on cache-aware endpoints, or custom headers.
  • const values on properties (common as JSON:API discriminator fields, e.g. type: "show") were dropped, leaving an unconstrained string. enum with a single value followed the same code path.

Fixes:

  • ResponseDetail gains a Headers map[string]HeaderDetail field, marshaled as headers with omitempty.
  • New flattenResponseHeaders helper emits per-header description, required flag, and schema.
  • populateSchemaCompounds now recurses into oneOf / anyOf / allOf / not via a new flattenSchemaRefs helper. $ref branches get inlined by the existing schemaToValue recursion; type: "null" branches are preserved. The same maxSchemaDepth cap that protects against recursive schemas still applies.
  • populateSchemaScalars now copies s.Const alongside the existing scalar keywords.

Five new tests against a JSON:API style OpenAPI 3.1 fixture exercise every reported pattern, including nested cases (anyOf inside array items).

Operator impact

Existing installs

A rolling restart will start exposing /metrics on :9090. If you do not want that, set OTEL_METRICS_ENABLED=false in the deployment env. If you do want it but the port is not yet allowed in your NetworkPolicy, scrapers will fail until you allow it. The endpoint is unauthenticated by design (isolated listener); restrict with a NetworkPolicy.

Recommended manifest tuning

The existing terminationGracePeriodSeconds: 30 is tight for the new shutdown chain. The default budget is roughly 2s pre-delay + 25s HTTP grace + 10s lifecycle stop + close overhead. Suggested:

spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60

For deployments expecting higher traffic, see docs/reference/tuning-and-scaling.md for GOMEMLIMIT / GOMAXPROCS / GOGC guidance. Without them the Go runtime is not cgroup-aware: GOMAXPROCS defaults to the node's CPU count (often 32-64) inside a 500m cgroup quota, causing scheduler thrash; an unset GOMEMLIMIT means the GC will not pace against the memory limit.

Compatibility

  • Tool inputs, tool names, configuration schema, migrations, and wire-format identifiers are unchanged.
  • api_get_endpoint_schema output is additive: the new headers field and the oneOf / anyOf / allOf / not / const keys appear only for specs that declare them. Clients that ignore unknown JSON fields are unaffected.
  • The metrics-default flip is the only behavior change at deploy time. Pin OTEL_METRICS_ENABLED=false in your env if you need the previous posture.

New documentation

  • docs/reference/tuning-and-scaling.md walks through resource sizing by traffic tier, Go runtime env vars, the HA-safety matrix for multi-replica deployments, PostgreSQL pool sizing per replica count, a four-stage graceful-shutdown budget tied to terminationGracePeriodSeconds, and an HPA example.

Changelog

Features

  • 6440705: feat(observability): enable metrics by default and harden shutdown (#449) (@cjimti)

Bug Fixes

  • fc38bb7: fix(apigateway): surface oneOf/anyOf/allOf, response headers, and const in api_get_endpoint_schema (#450) (@cjimti)

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.66.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.66.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.66.0_linux_amd64.tar.gz