Skip to content

mcp-data-platform-v1.61.0

Choose a tag to compare

@github-actions github-actions released this 13 May 22:09
· 126 commits to main since this release
fd7f907

Highlights

This release makes OAuth-backed connections (MCP gateway and HTTP API gateway) survive arbitrary periods of inactivity, surface every refresh / rotation / revocation in a durable history, and never lose a token silently again.

If you operate any auth_mode: oauth connection through this platform, read the migration notes below — particularly if your upstream uses wall-clock refresh token expiry or one-time-use rotation.

What's new

Background OAuth refresh (the keepalive)

A new platform-managed loop refreshes every connection's tokens before they expire, independent of inbound traffic. Connections used once a week (or once a month) now stay alive as long as the IdP itself permits — no operator action required between uses.

Defaults:

  • Loop tick: every 5 minutes
  • Refresh when access token is within 5 minutes of expiry
  • Refresh when IdP-disclosed refresh deadline is within 1 hour
  • Operator-configured wall-clock max via the new oauth2_refresh_max_lifetime connection field

Per-IdP guidance:

IdP Disclosed deadline Set oauth2_refresh_max_lifetime?
Keycloak / Auth0 / Okta refresh_expires_in optional
Google (Workspace, Drive, Gmail, GCP) no — invalidated by ~6mo inactivity OR user revocation no — access-token cadence is the keepalive
Salesforce depends on connected-app policy 30d (or your policy window) when policy is "expire after inactivity"
IdPs with hard wall-clock expiry and mandatory one-time-use rotation no — hard wall clock with mandatory rotation set to the IdP's documented maximum — required to keep the connection alive
Microsoft Graph / Entra sliding 90d max 90d recommended

Multi-replica: advisory locks via pg_try_advisory_lock keyed on (kind, name) ensure two replicas don't race-refresh the same connection. Single-replica deployments get a no-op locker automatically.

Auth event history (90-day audit trail)

Every connect, refresh, rotation, revocation, and admin deletion is now recorded in the new connection_auth_events table (migration 000040). The portal renders the most recent 30 events under each connection's OAuth status card in a collapsible History panel — operators can answer "what happened to this token last Tuesday at 10:42?" without scraping pod logs.

Event types are a closed set (connect_started, connect_completed, refresh_succeeded, refresh_failed_transient, refresh_failed_revoked, refresh_rotation_persistence_failed, token_deleted_revoked, token_deleted_admin). Detail payloads carry timing, durations, the RFC 6749 error field, and rotation booleans — but never tokens, never IdP error_description strings (which can carry user identifiers on some IdPs), never DB driver wrapping errors.

Retention: 90 days, pruned daily by a background goroutine.

Admin endpoint: GET /api/v1/admin/connections/{kind}/{name}/auth-events — same auth gate as the rest of the admin surface.

Observable revocations (no more silent token loss)

Previously, when an IdP rejected a refresh token, the platform deleted the row and returned ErrNeedsReauth with no log line, no audit event, nothing. The OAuth status card showed "Token: not yet acquired" — indistinguishable from a never-connected connection. Operators were left guessing whether the token had ever existed, when it died, and why.

Three call sites previously did _ = store.Delete(...) and returned. Each now:

  1. Emits refresh_failed_revoked with the cause encoded in IDPErrorCode.
  2. Attempts store.Delete. On failure -> slog.Warn and return. On success -> slog.Info("token row deleted: refresh rejected by IdP") and emit token_deleted_revoked.

The INFO log lands after the delete succeeds, so audit trails never falsely claim a deletion that didn't happen.

For IdPs with one-time-use rotation, a successful refresh whose persist call fails is permanent credential loss (the IdP already invalidated the old refresh token; we just lost the new one). That case now emits at ERROR slog level and writes a refresh_rotation_persistence_failed event, so the page-worthy condition is visible before the next tool call exposes the dead connection.

Three-state OAuth status card

The OAuth status card on every connection settings page now distinguishes three states instead of conflating two of them:

  • Never connected — no token row has ever existed.
  • Revoked — token row existed and was deleted because the IdP rejected the refresh. The card shows the IdP host, the timestamp, and the machine-readable reason (invalid_grant, no_refresh_token, refresh_expired).
  • Connected — valid token row present; status grid renders as before.

Before this release, "Revoked" looked exactly like "Never connected" — the card just said "Click Connect to authorize this connection in your browser." Now the revoked state surfaces the actual cause and timestamp.

Migration notes

Migration 000040 (connection_auth_events)

Applied automatically on platform startup. Idempotent. Rollback drops the table and its two indexes.

New connection config fields

oauth2_refresh_max_lifetime (optional, string) — duration the operator believes the upstream IdP permits the refresh token to live without use. Accepts day-suffixed durations (60d, 90d) and standard Go duration strings (24h, 30m).

When the IdP does NOT disclose refresh_expires_in, this is the only signal the refresher has to fire proactively. Set it for any IdP with a hard wall-clock refresh expiry — without it, connections to those upstreams will silently die at the wall-clock deadline.

Wiring change for operators running custom toolkits

Platform.WireGatewayTokenStore and Platform.WireAPIGatewayTokenStore now also call tk.SetAuthEvents(p.AuthEventWriter()). If you have a custom toolkit registered via RegisterToolkit, add SetAuthEvents(*authevents.Writer) to its public surface and accept the writer the platform passes — otherwise tool-call-triggered refreshes through that toolkit will emit zero events (the connection itself will work; the History panel just won't have data).

No breaking API changes

All admin endpoints existing before this release are unchanged. The new GET /api/v1/admin/connections/{kind}/{name}/auth-events is purely additive.

Operator action items

Triage in this order:

  1. Audit your existing OAuth connections. For each, check the upstream IdP family. If it uses wall-clock refresh expiry or one-time-use rotation, add oauth2_refresh_max_lifetime to the connection config now. Without it, connections to those upstreams that have been quiet may die at the wall-clock deadline. The platform will recover (operator clicks Connect again), but the audit trail will show the revoked transition.
  2. Verify the refresher is running. Check pod logs at startup for connoauth: refresher started interval=5m0s access_lead=5m0s refresh_lead=1h0m0s. If absent, the platform may have started without a database — the refresher requires a database-backed token store.
  3. Spot-check the History panel. Open any connection's settings, expand the History section. You should see at least a connect_started and connect_completed event from the connection's original setup. New events accumulate as the refresher and admin actions fire.

Acceptance test

End-to-end verification operators can run themselves:

  1. Connect a mcp or api connection with auth_mode: oauth.
  2. Send zero tool calls for 45 minutes (past Keycloak's default 30-min SSO Session Idle).
  3. Send one tool call. It succeeds.
  4. Open the History panel. There's at least one refresh_succeeded event with actor: system:background-refresh between the connect and the tool call.

Internal quality notes

This release went through 10 rounds of adversarial pre-commit review. Six real defects were caught before merge that would otherwise have shipped — including JSON tags missing from the Event struct (would have rendered the History panel as empty rows), SetAuthEvents never wired from production (would have made the History panel empty in the dominant code path), and a misleading "token row deleted" INFO log emitted before the delete attempt. All findings have regression tests.

make verify clean: fmt, lint, gosec, semgrep, CodeQL, total coverage >=87%, patch coverage >=80%, mutation >=60% efficacy, release-check.

Related

  • Pull request: #399
  • Ticket: #395
  • See also: #394 (portal SPA 401 handling on expired browser session — different surface, same theme of "credential expired with no useful guidance to the operator")

Changelog

  • fd7f907: feat(connoauth): proactive refresh, auth event history, observable revocations (#395) (#399) (@cjimti)

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.61.0

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.61.0_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.61.0_linux_amd64.tar.gz