mcp-data-platform-v1.62.2
Connection-OAuth: terminal-error classifier + per-row health badge
Three connected fixes for the class of bug where a connection silently dies in the background and the operator has no UI signal that anything is wrong. Surfaced by a production deployment whose upstream IdP returned 400 invalid_client on every refresh: the platform retried every 5 minutes for hours, every API call against that connection failed, and the only way to discover the problem was to open the connection drawer and scroll the History panel.
PR #423.
1. Widen the terminal-error classifier
classifyRefreshError previously wrapped only 400 invalid_grant as definitively revoked. Every other RFC 6749 §5.2 error code was treated as transient. The refresher retried indefinitely, the token row stayed in place, needs_reauth stayed false, and the operator-facing surface had nothing to show.
Terminal set is now widened to match RFC 6749 §5.2 semantics:
| Response | Pre | Post |
|---|---|---|
400 invalid_grant |
terminal | terminal |
400 invalid_client |
transient | terminal |
400 unauthorized_client |
transient | terminal |
400 unsupported_grant_type |
transient | terminal |
| 401 (any code) | transient | terminal |
400 invalid_request, server_error, etc. |
transient | transient |
| 5xx, network drop, ctx cancel | transient | transient |
All terminal responses route through handleRevoked: token row deleted, needs_reauth=true, ErrNeedsReauth surfaced to callers. The transient set is preserved so a flaky upstream does not force a reconnect over a single retryable blip.
pkg/connoauth/source.go:521-595, pkg/connoauth/errors.go:24-51.
2. History panel now carries the actual IdP error code
A new typed sentinel terminalRefreshError preserves the RFC 6749 error field through the sanitization pass. Two downstream consumers benefit:
classifyRevokedReasonreturns the specific code (e.g.invalid_client) instead of alwaysinvalid_grant. The History panel distinguishes "refresh_token revoked" from "client_secret no longer valid" so the operator knows whether to edit the secret or just click Reconnect.RefreshFailedTransientevents now populateIDPErrorCodefrom the pre-classify error, so the History row shows what the IdP returned instead of rendering an empty detail box.
Both call sites route the code through sanitizeOAuthErrorField at the boundary. A chatty or hostile IdP cannot inflate the auth_events.detail JSON column or inject URL-shaped content into the operator-facing tooltip surface.
3. New bulk OAuth-health endpoint plus connection-list badge
Pre-fix, an operator triaging "why is everything failing" had to open each connection one at a time and scroll its History panel. Post-fix:
New bulk endpoint GET /api/v1/admin/connections/oauth-health returns one summary row per connection:
{
"connections": [
{
"kind": "api",
"name": "my-connection",
"has_oauth": true,
"needs_reauth": true,
"token_acquired": false,
"idp_error_code": "invalid_client"
}
]
}Non-OAuth connections appear with has_oauth=false so the UI does not have to filter client-side.
New React hook useConnectionsOAuthHealth polls the endpoint every 10 seconds.
New ConnectionOAuthHealthBadge component renders a per-row badge on the connection list view:
| Badge | When | Action |
|---|---|---|
red reauth (with code in tooltip) |
needs_reauth=true |
Click Reconnect (or edit secret first if idp_error_code=invalid_client) |
amber refresh failing (with code in tooltip) |
Last refresh failed transiently, access token still valid | Watch; transient errors usually recover |
| no badge | Healthy | None |
The operator sees connection trouble from the list view without clicking in.
latestRefreshErrorCode reads only the single newest event: if any event other than refresh_failed_revoked or refresh_failed_transient sits at events[0], the code is empty. An older failure followed by a newer connect_completed, refresh_succeeded, or token_deleted_admin is not a current error and the badge clears immediately on reconnect.
Tests
13 new tests pin every piece, including:
- Full terminal set table-driven test (8 cases covering each new RFC 6749 code plus representative transient cases).
- IdP error code surfaces correctly into auth-event details, on both terminal and transient paths.
- 5 KB hostile IdP error code is bounded to 200 chars; URL-shaped content is stripped.
- Bulk health endpoint: empty store, mixed OAuth + non-OAuth connections, store-error 500, code from latest event, recent success clears the code, AND reconnect clears the badge even with an older
refresh_failed_revokedin history.
make verify green (full suite: test/race, coverage gate, patch coverage, golangci-lint, gosec, govulncheck, semgrep, codeql, dead-code, mutation testing, GoReleaser dry-run). Two-round adversarial pre-commit review: round 1 surfaced 5 substantive findings (stale code after reconnect, vendor-name violation, em dashes, brittle Limit constant, unsanitized IdP code), all fixed; round 2 CLEAN.
Upgrade notes
- No schema change, no config change, no breaking API change. Drop-in upgrade from v1.62.1.
- Connections previously stuck in silent-retry against
invalid_clientrecover within one refresh tick after the rollout: the next refresh attempt now classifies as terminal, deletes the token row, and surfacesneeds_reauth=true. Operators see the redreauthbadge immediately on the connection list, click Reconnect, and the connection works again. No data is lost; the refresh_token deletion is the same pathinvalid_grantalways took. - New endpoint adds one DB read per row per 10s poll. Acceptable for typical connection counts. Could be batched into a single query if instances ever grow to hundreds of connections.
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformClaude Code CLI
claude mcp add mcp-data-platform -- mcp-data-platformDocker
docker pull ghcr.io/txn2/mcp-data-platform:v1.62.2Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-data-platform_1.62.2_linux_amd64.tar.gz.sigstore.json \
mcp-data-platform_1.62.2_linux_amd64.tar.gz