Skip to content

mcp-data-platform-v1.62.3

Choose a tag to compare

@github-actions github-actions released this 18 May 00:34
· 105 commits to main since this release
dd8bc39

Root-cause fix: OAuth refresh fails when client_secret contains URL-significant characters

Fixes the production failure pattern where a connection works for roughly one hour after every Connect, then dies with invalid_client until the next manual Reconnect. v1.62.2 surfaced this failure in the UI (red reauth badge, History event); v1.62.3 fixes the underlying cause.

PR #424.

Root cause

golang.org/x/oauth2 v0.36.0/internal/token.go:202 wraps client_id and client_secret in url.QueryEscape() before building the Basic auth header on the refresh path:

req.SetBasicAuth(url.QueryEscape(clientID), url.QueryEscape(clientSecret))

That follows RFC 6749 §2.3.1 to the letter. Many production IdPs (Salesforce, GitHub, and similar OAuth 2.0 token endpoints) store the raw client_secret value server-side and compare against the raw secret in the inbound Basic auth header. A URL-encoded secret in the header is literally a different password from what the IdP has on file, and the IdP returns 400 invalid_client.

A client_secret containing any character URL-encoding rewrites (+, /, = mid-string, space, etc.) hits this directly. Example: secret +8r5xs3YLFC/LHK= becomes %2B8r5xs3YLFC%2FLHK%3D on the wire. Not the secret the IdP has on file.

The symptom-pair the bug explained

Operation Code path URL-encodes? Behavior
Initial code-for-tokens exchange Our custom code at pkg/connoauth/exchange.go:139 No Works
First refresh after Connect golang.org/x/oauth2.Config.TokenSource Yes Returns 400 invalid_client
Reference Python proxy refresh against the same kind of IdP requests.HTTPBasicAuth No Works for years

The initial Connect always succeeded (the code-exchange path uses plain SetBasicAuth, no encoding). The refresh attempted roughly 55 minutes later always failed (the library wraps the secret with url.QueryEscape). The connection's first refresh tick after every Connect killed it. Reconnect would issue a fresh token, the fresh token would refresh-fail an hour later, repeat.

Fix

New file pkg/connoauth/refresh.go with Refresh / buildRefreshRequest / postRefresh / decodeRefreshResponse, mirroring the working Exchange code path:

// pkg/connoauth/refresh.go:124-127
if in.Config.EndpointAuthStyle != oauth2.AuthStyleInParams {
    // Plain SetBasicAuth: no URL-encoding of the credentials.
    req.SetBasicAuth(in.Config.ClientID, in.Config.ClientSecret)
}

Source.refresh now calls this instead of oauth2.Config.TokenSource. The body-encoded auth style (AuthStyleInParams) is supported and unaffected; url.Values.Encode() is standard form encoding that the IdP URL-decodes server-side back to the raw secret, so that round-trip is correct.

postRefresh translates non-2xx responses into *oauth2.RetrieveError so the downstream classifyRefreshError / idpErrorCodeOf / isTerminalIDPRejection pipeline introduced in v1.62.2 keeps working unchanged.

Dead-code cleanup made possible by the refactor

  • Source.client field and its initializer in NewSource: was only consumed by the now-deleted context.WithValue(ctx, oauth2.HTTPClient, s.client) call in the old refresh path. Removed.
  • refreshDeadlineFromToken: only consumed by the two persist / emit sites in the old refresh, both now read RefreshResult.RefreshExpiresAt directly. Removed (function and its test).
  • Config.oauth2Config helper: zero remaining callers. Removed.

persistRefreshed now takes *RefreshResult instead of *oauth2.Token.

Regression tests

Test Pins
TestRefresh_BasicAuth_SendsRawSecret Stands up a fake IdP, sends the literal +8r5xs3YLFC/LHK= secret, asserts the Authorization header carries the base64 of client:+8r5xs3YLFC/LHK= and NOT the URL-encoded form. This is the test whose absence let the bug ship.
TestRefresh_AuthStyleInParams_SecretInBody Body-encoded path still delivers the raw secret via form encoding round-trip.
TestRefresh_IdPError_Returns_RetrieveError 4xx responses are wrapped as *oauth2.RetrieveError so the existing classify pipeline keeps working.
TestRefresh_PersistsRotatedToken Rotated refresh_token plus refresh_expires_in flow through into RefreshResult.
TestRefresh_ValidateInput Malformed input rejected before any network call.

Why every prior gate missed this

  • The library is RFC-correct. make verify, golangci-lint, gosec, CodeQL, and semgrep all see a stdlib-style oauth2 call that follows the spec. None of them know that real IdPs reject what the spec says.
  • The exchange-vs-refresh asymmetry was invisible to single-file review. The exchange path is in exchange.go (custom, works). The refresh path was a delegating one-liner to a third-party library (broken). Comparing them required noticing they were the same problem shape with different implementations.
  • v1.62.2 widened the terminal-error classifier so the failure became visible (red reauth badge, Token row deleted event) instead of silently retrying. That made the badge red but did not fix the underlying refresh failure. v1.62.3 fixes the underlying failure.
  • A reference Python proxy that worked for years on the same kind of IdP was available the whole time. Comparing it earlier would have surfaced this. Lesson recorded: when our code disagrees with a working reference, compare wire shape byte-for-byte before theorizing.

Adversarial pre-commit review

Three rounds of adversarial review before commit:

  1. Round 1: 2 findings (dead code from the refactor: Source.client, refreshDeadlineFromToken). Both deleted.
  2. Round 2: 3 findings (vendor-specific brand names in doc comments). All scrubbed to generic examples.
  3. Round 3: 2 findings (gosec G120: parsing form data without http.MaxBytesReader in the new test). Fixed by wrapping r.Body in MaxBytesReader before ParseForm and switching to PostFormValue. Notable: the raw gosec binary did not report these issues, but golangci-lint run --new-from-rev=$MERGE_BASE did. The gate caught what local verification missed.
  4. Round 4: CLEAN.

make verify green locally (full suite: test/race, coverage, patch coverage, golangci-lint, gosec, govulncheck, semgrep, codeql, dead-code, mutation, GoReleaser dry-run).

Upgrade notes

  • No schema change, no config change, no breaking API change. Drop-in upgrade from v1.62.2.
  • Connections stuck on needs_reauth because their first refresh post-v1.62.2 hit invalid_client: after rolling to v1.62.3, click Reconnect once. The new token's refresh tick at ~55 minutes will succeed instead of failing, and the connection stays alive across the normal refresh cycle.
  • Connections that worked yesterday but stopped working today with invalid_client: the refresh-time URL-encoding mismatch was likely always present in the library path, but became visible when the operator's stored client_secret value happened to contain +, /, =, or another URL-significant character. After rolling to v1.62.3, those connections refresh successfully again.
  • On-deploy diagnostic: Refresh succeeded events should appear in the connection's History panel within roughly 55 minutes of each Connect, instead of the prior Refresh rejected by IdP (invalid_client) followed by Token row deleted. The red reauth badge from v1.62.2 should not reappear after a fresh Connect.

Installation

Homebrew (macOS)

brew install txn2/tap/mcp-data-platform

Claude Code CLI

claude mcp add mcp-data-platform -- mcp-data-platform

Docker

docker pull ghcr.io/txn2/mcp-data-platform:v1.62.3

Verification

All release artifacts are signed with Cosign. Verify with:

cosign verify-blob --bundle mcp-data-platform_1.62.3_linux_amd64.tar.gz.sigstore.json \
  mcp-data-platform_1.62.3_linux_amd64.tar.gz