mcp-data-platform-v1.58.0
Two production bugs in the gateway toolkit are fixed in this release, plus a wide-scope code-quality cleanup pass.
TL;DR — what changed and what to do
| Before v1.58.0 | After v1.58.0 | |
|---|---|---|
OAuth authorization_code upstreams |
Refresh token died ~30 min after the SSO session went idle. Every pod restart forced a manual Reconnect. | Refresh tokens survive platform restarts. One Reconnect per upstream after upgrade and you're done. |
| Tool list freshness | When a gateway upstream re-authenticated or a connection was added/removed, downstream agents showed the stale tool list until the user disconnected and reconnected. | Downstream agents (Claude.ai, Claude Desktop) receive notifications/tools/list_changed live over a long-lived SSE channel. No reconnect required. |
Operator action on upgrade: click Reconnect once per existing authorization_code gateway connection in the admin portal. Pre-existing connections still hold refresh tokens issued under the old (no-offline_access) flow; one Reconnect re-issues them under the new augmented scope. The admin Status surface shows an actionable hint when this is the case.
Bug A — Re-auth required after every server restart
authorization_code upstream connections were storing refresh tokens issued without the offline_access scope. Keycloak (and similar IdPs) tie the refresh-token lifetime to the user's interactive SSO Session Idle (~30 min default) when offline_access is missing, so any pod restart longer than that idle window invalidated the refresh token.
What changed
parseOAuthConfignow defaultsoffline_accessinto the requested scope forauthorization_codegrants:- Empty scope →
openid profile email offline_access - Non-empty scope → append
offline_accessif missing (case-sensitive per RFC 6749 §3.3) client_credentialsis unchanged — no SSO session to outlive
- Empty scope →
OAuthConfig.OriginalScope+OAuthConfig.ScopeAugmentedpreserve the operator-supplied pre-augmentation scope so admin status can distinguish the two upgrade scenarios:- "Operator never asked for
offline_access" → Reconnect will fix. Admin status shows:refresh rejected: persisted refresh grant predates the offline_access default; reauthorize so the IdP issues a token under the augmented scope (and ensure the IdP client has offline_access assignable) - "Operator asked, IdP still rejected" → IdP misconfigured. The upstream rejection message is preserved verbatim
- "Operator never asked for
- The hint reaches the admin status surface even for the legacy un-customized case (operator never set
oauth_scopeat all — the most common upgrade path)
Keycloak setup note
For the augmentation to actually issue offline tokens, the offline_access client scope must be Optional or Default on the gateway client:
Realm → Clients →
<gateway client>→ Client Scopes → Setup → ensureoffline_accessis listed under Optional or Default
The user completing the Connect flow must also have the offline_access realm role (granted by default in most realms).
Bug B — Downstream agents miss tools/list_changed in stateless mode
The Go MCP SDK refuses GET-for-SSE and closes sessions at end-of-request when running in stateless streamable HTTP mode (the production multi-replica deployment shape). The result: when a gateway upstream re-authenticated after Bug A's fix landed, downstream agents kept showing the stale tool list because the SDK's native push channel was disabled.
What changed — broadcaster pipeline replacing the SDK's missing native push
graph LR
GW[Gateway toolkit<br/>AddTool / RemoveTool] -->|debounced 50ms| NotifyAdapter[gatewayListChangedNotifier]
NotifyAdapter --> Broker[session.Broadcaster]
Broker --> Memory[MemoryBroadcaster<br/>single-replica]
Broker --> Postgres[postgres.Broadcaster<br/>LISTEN/NOTIFY]
Memory --> Sub[SSE Subscriber<br/>per-session]
Postgres -->|cross-replica| Sub
Sub -->|JSON-RPC 2.0<br/>over text/event-stream| Client[Claude.ai /<br/>Claude Desktop]- New
pkg/session.Broadcasterinterface with two implementations:- MemoryBroadcaster — single-replica or no-DB deployments
- postgres.Broadcaster —
LISTEN/NOTIFYfor multi-replica deployments. BoundedClosedrain (2 s) so a stuck listener can't pin shutdown past the orchestrator's grace period. Channel name override viasessions.broadcast_channelfor deployments sharing a postgres instance — defaultmcp_notificationswith a startup warn nudging operators to set a per-deployment channel when sharing
AwareHandler.handleSSEopens a long-lived SSE response onGET / Accept: text/event-stream:- Validates the session through the same
store.Get + ownershippath as POST: returns 500 on store error, 403 on ownership mismatch, 404 on missing — matching POST exactly so probing GET vs. POST cannot infer different facts - Subscribes to the broadcaster, streams every event as a JSON-RPC 2.0 notification with a 25 s comment-frame heartbeat (
: keepalive\n\n) to survive proxy idle timeouts
- Validates the session through the same
- Gateway toolkit's
notifyToolListChangedfires (50 ms debounced — longer than the SDK's 10 ms internal window to absorb the postgresLISTEN/NOTIFYround-trip) on every aggregate tool-inventory change that registers or removes at least one tool. Bounded dispatch context (5 s) prevents a partitioned downstream from leaking goroutines per inventory change Platform.WireGatewayBroadcasterplugs the broadcaster into every gateway toolkit, called unconditionally fromstartHTTPServerso it applies even when admin API is disabledisNilNotifierdetects typed-nil interface values (Ptr, Interface, Func) so a future caller passing a nil pointer-receiver adapter doesn't nil-deref at fire time
New configuration
sessions:
store: database
# Override the postgres LISTEN/NOTIFY channel name. Defaults to
# `mcp_notifications`. Set per deployment when multiple deployments
# share a single postgres instance, to prevent cross-deployment
# tools/list_changed fan-out.
broadcast_channel: my_deployment_eventsValidated up-front (only when store: database):
- ≤63 bytes (postgres
NAMEDATALEN-1LISTEN identifier limit) - Postgres unquoted identifier grammar
[A-Za-z_][A-Za-z0-9_$]*
Single-replica deployments and deployments without a database are unaffected — they use the in-memory broadcaster automatically.
Code-quality cleanup (bundled)
The two-bug fix touched many files; absorbing the codebase's outstanding goconst-violating duplicate-literal cleanup into the same release keeps the lint surface clean. ~50 files across pkg/admin, pkg/audit, pkg/middleware, pkg/memory, pkg/portal, pkg/persona, pkg/oauth, pkg/browsersession, pkg/resource, and pkg/toolkits/{knowledge,memory,portal} had repeated string literals extracted into named constants:
- Audit, memory, portal, knowledge SQL column names (
col*) - Admin redaction keys — now matches
fieldcrypt's encryption set;TestSensitiveKeysCoverFieldcryptSetlocks the invariant so a future divergence between encrypted-at-rest fields and admin-redacted fields fails a test (closes a latent secret-leak hole ifENCRYPTION_KEYwere ever unset) - HTTP/MCP MIME types, status strings, role names, content-block types
- OIDC/OAuth claim and parameter names
- DataHub entity types and assertion statuses
- Platform config keys exported as
ConfigKeyServerDescription,ConfigKeyServerAgentInstructions,ConfigKeyToolsDenyso admin reads and platform writes use the same canonical wire keys
No behavior changes — purely a maintainability pass.
Upgrade notes
- All deployments: zero-downtime upgrade. Restart pods normally
- Deployments with
authorization_codegateway upstreams: click Reconnect once per existing connection after upgrade. The admin Status surface flags which connections need it - Multi-replica deployments sharing a single postgres instance: set
sessions.broadcast_channelto a deployment-unique value before upgrade to prevent cross-deploymenttools/list_changedfan-out. The platform logs a warning at startup if the default channel is in use - Single-replica or memory-store deployments: no config changes required
Verification
make verify passes:
gofmt,go vet, race tests- ≥80% total + patch coverage
golangci-lint(cyclomatic ≤10, cognitive ≤15)gosec,govulncheck,semgrep, CodeQL (security-and-quality)- Dead-code analysis
- Mutation testing ≥60% kill rate
goreleaserdry-run- Documentation completeness check
Changelog
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformClaude Code CLI
claude mcp add mcp-data-platform -- mcp-data-platformDocker
docker pull ghcr.io/txn2/mcp-data-platform:v1.58.0Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-data-platform_1.58.0_linux_amd64.tar.gz.sigstore.json \
mcp-data-platform_1.58.0_linux_amd64.tar.gz