Skip to content

mcp-data-platform-v1.58.0

Choose a tag to compare

@github-actions github-actions released this 07 May 16:30
· 146 commits to main since this release
bc0ba67

Two production bugs in the gateway toolkit are fixed in this release, plus a wide-scope code-quality cleanup pass.

TL;DR — what changed and what to do

Before v1.58.0 After v1.58.0
OAuth authorization_code upstreams Refresh token died ~30 min after the SSO session went idle. Every pod restart forced a manual Reconnect. Refresh tokens survive platform restarts. One Reconnect per upstream after upgrade and you're done.
Tool list freshness When a gateway upstream re-authenticated or a connection was added/removed, downstream agents showed the stale tool list until the user disconnected and reconnected. Downstream agents (Claude.ai, Claude Desktop) receive notifications/tools/list_changed live over a long-lived SSE channel. No reconnect required.

Operator action on upgrade: click Reconnect once per existing authorization_code gateway connection in the admin portal. Pre-existing connections still hold refresh tokens issued under the old (no-offline_access) flow; one Reconnect re-issues them under the new augmented scope. The admin Status surface shows an actionable hint when this is the case.


Bug A — Re-auth required after every server restart

authorization_code upstream connections were storing refresh tokens issued without the offline_access scope. Keycloak (and similar IdPs) tie the refresh-token lifetime to the user's interactive SSO Session Idle (~30 min default) when offline_access is missing, so any pod restart longer than that idle window invalidated the refresh token.

What changed

  • parseOAuthConfig now defaults offline_access into the requested scope for authorization_code grants:
    • Empty scope → openid profile email offline_access
    • Non-empty scope → append offline_access if missing (case-sensitive per RFC 6749 §3.3)
    • client_credentials is unchanged — no SSO session to outlive
  • OAuthConfig.OriginalScope + OAuthConfig.ScopeAugmented preserve the operator-supplied pre-augmentation scope so admin status can distinguish the two upgrade scenarios:
    • "Operator never asked for offline_access" → Reconnect will fix. Admin status shows: refresh rejected: persisted refresh grant predates the offline_access default; reauthorize so the IdP issues a token under the augmented scope (and ensure the IdP client has offline_access assignable)
    • "Operator asked, IdP still rejected" → IdP misconfigured. The upstream rejection message is preserved verbatim
  • The hint reaches the admin status surface even for the legacy un-customized case (operator never set oauth_scope at all — the most common upgrade path)

Keycloak setup note

For the augmentation to actually issue offline tokens, the offline_access client scope must be Optional or Default on the gateway client:

Realm → Clients → <gateway client> → Client Scopes → Setup → ensure offline_access is listed under Optional or Default

The user completing the Connect flow must also have the offline_access realm role (granted by default in most realms).


Bug B — Downstream agents miss tools/list_changed in stateless mode

The Go MCP SDK refuses GET-for-SSE and closes sessions at end-of-request when running in stateless streamable HTTP mode (the production multi-replica deployment shape). The result: when a gateway upstream re-authenticated after Bug A's fix landed, downstream agents kept showing the stale tool list because the SDK's native push channel was disabled.

What changed — broadcaster pipeline replacing the SDK's missing native push

graph LR
    GW[Gateway toolkit<br/>AddTool / RemoveTool] -->|debounced 50ms| NotifyAdapter[gatewayListChangedNotifier]
    NotifyAdapter --> Broker[session.Broadcaster]
    Broker --> Memory[MemoryBroadcaster<br/>single-replica]
    Broker --> Postgres[postgres.Broadcaster<br/>LISTEN/NOTIFY]
    Memory --> Sub[SSE Subscriber<br/>per-session]
    Postgres -->|cross-replica| Sub
    Sub -->|JSON-RPC 2.0<br/>over text/event-stream| Client[Claude.ai /<br/>Claude Desktop]
  • New pkg/session.Broadcaster interface with two implementations:
    • MemoryBroadcaster — single-replica or no-DB deployments
    • postgres.BroadcasterLISTEN/NOTIFY for multi-replica deployments. Bounded Close drain (2 s) so a stuck listener can't pin shutdown past the orchestrator's grace period. Channel name override via sessions.broadcast_channel for deployments sharing a postgres instance — default mcp_notifications with a startup warn nudging operators to set a per-deployment channel when sharing
  • AwareHandler.handleSSE opens a long-lived SSE response on GET / Accept: text/event-stream:
    • Validates the session through the same store.Get + ownership path as POST: returns 500 on store error, 403 on ownership mismatch, 404 on missing — matching POST exactly so probing GET vs. POST cannot infer different facts
    • Subscribes to the broadcaster, streams every event as a JSON-RPC 2.0 notification with a 25 s comment-frame heartbeat (: keepalive\n\n) to survive proxy idle timeouts
  • Gateway toolkit's notifyToolListChanged fires (50 ms debounced — longer than the SDK's 10 ms internal window to absorb the postgres LISTEN/NOTIFY round-trip) on every aggregate tool-inventory change that registers or removes at least one tool. Bounded dispatch context (5 s) prevents a partitioned downstream from leaking goroutines per inventory change
  • Platform.WireGatewayBroadcaster plugs the broadcaster into every gateway toolkit, called unconditionally from startHTTPServer so it applies even when admin API is disabled
  • isNilNotifier detects typed-nil interface values (Ptr, Interface, Func) so a future caller passing a nil pointer-receiver adapter doesn't nil-deref at fire time

New configuration

sessions:
  store: database
  # Override the postgres LISTEN/NOTIFY channel name. Defaults to
  # `mcp_notifications`. Set per deployment when multiple deployments
  # share a single postgres instance, to prevent cross-deployment
  # tools/list_changed fan-out.
  broadcast_channel: my_deployment_events

Validated up-front (only when store: database):

  • ≤63 bytes (postgres NAMEDATALEN-1 LISTEN identifier limit)
  • Postgres unquoted identifier grammar [A-Za-z_][A-Za-z0-9_$]*

Single-replica deployments and deployments without a database are unaffected — they use the in-memory broadcaster automatically.


Code-quality cleanup (bundled)

The two-bug fix touched many files; absorbing the codebase's outstanding goconst-violating duplicate-literal cleanup into the same release keeps the lint surface clean. ~50 files across pkg/admin, pkg/audit, pkg/middleware, pkg/memory, pkg/portal, pkg/persona, pkg/oauth, pkg/browsersession, pkg/resource, and pkg/toolkits/{knowledge,memory,portal} had repeated string literals extracted into named constants:

  • Audit, memory, portal, knowledge SQL column names (col*)
  • Admin redaction keys — now matches fieldcrypt's encryption set; TestSensitiveKeysCoverFieldcryptSet locks the invariant so a future divergence between encrypted-at-rest fields and admin-redacted fields fails a test (closes a latent secret-leak hole if ENCRYPTION_KEY were ever unset)
  • HTTP/MCP MIME types, status strings, role names, content-block types
  • OIDC/OAuth claim and parameter names
  • DataHub entity types and assertion statuses
  • Platform config keys exported as ConfigKeyServerDescription, ConfigKeyServerAgentInstructions, ConfigKeyToolsDeny so admin reads and platform writes use the same canonical wire keys

No behavior changes — purely a maintainability pass.


Upgrade notes

  • All deployments: zero-downtime upgrade. Restart pods normally
  • Deployments with authorization_code gateway upstreams: click Reconnect once per existing connection after upgrade. The admin Status surface flags which connections need it
  • Multi-replica deployments sharing a single postgres instance: set sessions.broadcast_channel to a deployment-unique value before upgrade to prevent cross-deployment tools/list_changed fan-out. The platform logs a warning at startup if the default channel is in use
  • Single-replica or memory-store deployments: no config changes required

Verification

make verify passes:

  • gofmt, go vet, race tests
  • ≥80% total + patch coverage
  • golangci-lint (cyclomatic ≤10, cognitive ≤15)
  • gosec, govulncheck, semgrep, CodeQL (security-and-quality)
  • Dead-code analysis
  • Mutation testing ≥60% kill rate
  • goreleaser dry-run
  • Documentation completeness check

Changelog

  • bc0ba67: fix(gateway): persistent re-auth + live tools/list_changed propagation (#360) (@cjimti)

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.58.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.58.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.58.0_linux_amd64.tar.gz