Release mcp-data-platform-v1.58.0 · txn2/mcp-data-platform

Two production bugs in the gateway toolkit are fixed in this release, plus a wide-scope code-quality cleanup pass.

TL;DR — what changed and what to do

	Before v1.58.0	After v1.58.0
OAuth `authorization_code` upstreams	Refresh token died ~30 min after the SSO session went idle. Every pod restart forced a manual Reconnect.	Refresh tokens survive platform restarts. One Reconnect per upstream after upgrade and you're done.
Tool list freshness	When a gateway upstream re-authenticated or a connection was added/removed, downstream agents showed the stale tool list until the user disconnected and reconnected.	Downstream agents (Claude.ai, Claude Desktop) receive `notifications/tools/list_changed` live over a long-lived SSE channel. No reconnect required.

Operator action on upgrade: click Reconnect once per existing authorization_code gateway connection in the admin portal. Pre-existing connections still hold refresh tokens issued under the old (no-offline_access) flow; one Reconnect re-issues them under the new augmented scope. The admin Status surface shows an actionable hint when this is the case.

Bug A — Re-auth required after every server restart

authorization_code upstream connections were storing refresh tokens issued without the offline_access scope. Keycloak (and similar IdPs) tie the refresh-token lifetime to the user's interactive SSO Session Idle (~30 min default) when offline_access is missing, so any pod restart longer than that idle window invalidated the refresh token.

What changed

parseOAuthConfig now defaults offline_access into the requested scope for authorization_code grants:
- Empty scope → openid profile email offline_access
- Non-empty scope → append offline_access if missing (case-sensitive per RFC 6749 §3.3)
- client_credentials is unchanged — no SSO session to outlive
OAuthConfig.OriginalScope + OAuthConfig.ScopeAugmented preserve the operator-supplied pre-augmentation scope so admin status can distinguish the two upgrade scenarios:
- "Operator never asked for offline_access" → Reconnect will fix. Admin status shows: refresh rejected: persisted refresh grant predates the offline_access default; reauthorize so the IdP issues a token under the augmented scope (and ensure the IdP client has offline_access assignable)
- "Operator asked, IdP still rejected" → IdP misconfigured. The upstream rejection message is preserved verbatim
The hint reaches the admin status surface even for the legacy un-customized case (operator never set oauth_scope at all — the most common upgrade path)

Keycloak setup note

For the augmentation to actually issue offline tokens, the offline_access client scope must be Optional or Default on the gateway client:

Realm → Clients → <gateway client> → Client Scopes → Setup → ensure offline_access is listed under Optional or Default

The user completing the Connect flow must also have the offline_access realm role (granted by default in most realms).

Bug B — Downstream agents miss `tools/list_changed` in stateless mode

The Go MCP SDK refuses GET-for-SSE and closes sessions at end-of-request when running in stateless streamable HTTP mode (the production multi-replica deployment shape). The result: when a gateway upstream re-authenticated after Bug A's fix landed, downstream agents kept showing the stale tool list because the SDK's native push channel was disabled.

What changed — broadcaster pipeline replacing the SDK's missing native push

graph LR
    GW[Gateway toolkit<br/>AddTool / RemoveTool] -->|debounced 50ms| NotifyAdapter[gatewayListChangedNotifier]
    NotifyAdapter --> Broker[session.Broadcaster]
    Broker --> Memory[MemoryBroadcaster<br/>single-replica]
    Broker --> Postgres[postgres.Broadcaster<br/>LISTEN/NOTIFY]
    Memory --> Sub[SSE Subscriber<br/>per-session]
    Postgres -->|cross-replica| Sub
    Sub -->|JSON-RPC 2.0<br/>over text/event-stream| Client[Claude.ai /<br/>Claude Desktop]

New pkg/session.Broadcaster interface with two implementations:
- MemoryBroadcaster — single-replica or no-DB deployments
- postgres.Broadcaster — LISTEN/NOTIFY for multi-replica deployments. Bounded Close drain (2 s) so a stuck listener can't pin shutdown past the orchestrator's grace period. Channel name override via sessions.broadcast_channel for deployments sharing a postgres instance — default mcp_notifications with a startup warn nudging operators to set a per-deployment channel when sharing
AwareHandler.handleSSE opens a long-lived SSE response on GET / Accept: text/event-stream:
- Validates the session through the same store.Get + ownership path as POST: returns 500 on store error, 403 on ownership mismatch, 404 on missing — matching POST exactly so probing GET vs. POST cannot infer different facts
- Subscribes to the broadcaster, streams every event as a JSON-RPC 2.0 notification with a 25 s comment-frame heartbeat (: keepalive\n\n) to survive proxy idle timeouts
Gateway toolkit's notifyToolListChanged fires (50 ms debounced — longer than the SDK's 10 ms internal window to absorb the postgres LISTEN/NOTIFY round-trip) on every aggregate tool-inventory change that registers or removes at least one tool. Bounded dispatch context (5 s) prevents a partitioned downstream from leaking goroutines per inventory change
Platform.WireGatewayBroadcaster plugs the broadcaster into every gateway toolkit, called unconditionally from startHTTPServer so it applies even when admin API is disabled
isNilNotifier detects typed-nil interface values (Ptr, Interface, Func) so a future caller passing a nil pointer-receiver adapter doesn't nil-deref at fire time

New configuration

sessions:
  store: database
  # Override the postgres LISTEN/NOTIFY channel name. Defaults to
  # `mcp_notifications`. Set per deployment when multiple deployments
  # share a single postgres instance, to prevent cross-deployment
  # tools/list_changed fan-out.
  broadcast_channel: my_deployment_events

Validated up-front (only when store: database):

≤63 bytes (postgres NAMEDATALEN-1 LISTEN identifier limit)
Postgres unquoted identifier grammar [A-Za-z_][A-Za-z0-9_$]*

Single-replica deployments and deployments without a database are unaffected — they use the in-memory broadcaster automatically.

Code-quality cleanup (bundled)

The two-bug fix touched many files; absorbing the codebase's outstanding goconst-violating duplicate-literal cleanup into the same release keeps the lint surface clean. ~50 files across pkg/admin, pkg/audit, pkg/middleware, pkg/memory, pkg/portal, pkg/persona, pkg/oauth, pkg/browsersession, pkg/resource, and pkg/toolkits/{knowledge,memory,portal} had repeated string literals extracted into named constants:

Audit, memory, portal, knowledge SQL column names (col*)
Admin redaction keys — now matches fieldcrypt's encryption set; TestSensitiveKeysCoverFieldcryptSet locks the invariant so a future divergence between encrypted-at-rest fields and admin-redacted fields fails a test (closes a latent secret-leak hole if ENCRYPTION_KEY were ever unset)
HTTP/MCP MIME types, status strings, role names, content-block types
OIDC/OAuth claim and parameter names
DataHub entity types and assertion statuses
Platform config keys exported as ConfigKeyServerDescription, ConfigKeyServerAgentInstructions, ConfigKeyToolsDeny so admin reads and platform writes use the same canonical wire keys

No behavior changes — purely a maintainability pass.

Upgrade notes

All deployments: zero-downtime upgrade. Restart pods normally
Deployments with authorization_code gateway upstreams: click Reconnect once per existing connection after upgrade. The admin Status surface flags which connections need it
Multi-replica deployments sharing a single postgres instance: set sessions.broadcast_channel to a deployment-unique value before upgrade to prevent cross-deployment tools/list_changed fan-out. The platform logs a warning at startup if the default channel is in use
Single-replica or memory-store deployments: no config changes required

Verification

make verify passes:

gofmt, go vet, race tests
≥80% total + patch coverage
golangci-lint (cyclomatic ≤10, cognitive ≤15)
gosec, govulncheck, semgrep, CodeQL (security-and-quality)
Dead-code analysis
Mutation testing ≥60% kill rate
goreleaser dry-run
Documentation completeness check

Changelog

bc0ba67: fix(gateway): persistent re-auth + live tools/list_changed propagation (#360) (@cjimti)

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.58.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.58.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.58.0_linux_amd64.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mcp-data-platform-v1.58.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

TL;DR — what changed and what to do

Bug A — Re-auth required after every server restart

What changed

Keycloak setup note

Bug B — Downstream agents miss `tools/list_changed` in stateless mode

What changed — broadcaster pipeline replacing the SDK's missing native push

New configuration

Code-quality cleanup (bundled)

Upgrade notes

Verification

Changelog

Installation

Homebrew (macOS)

Claude Code CLI

Docker

Verification

Contributors

Uh oh!

Uh oh!

mcp-data-platform-v1.58.0

TL;DR — what changed and what to do

Bug A — Re-auth required after every server restart

What changed

Keycloak setup note

Bug B — Downstream agents miss tools/list_changed in stateless mode

What changed — broadcaster pipeline replacing the SDK's missing native push

New configuration

Code-quality cleanup (bundled)

Upgrade notes

Verification

Changelog

Installation

Homebrew (macOS)

Claude Code CLI

Docker

Verification

Contributors

Uh oh!

Bug B — Downstream agents miss `tools/list_changed` in stateless mode