mcp-data-platform-v1.62.3
Root-cause fix: OAuth refresh fails when client_secret contains URL-significant characters
Fixes the production failure pattern where a connection works for roughly one hour after every Connect, then dies with invalid_client until the next manual Reconnect. v1.62.2 surfaced this failure in the UI (red reauth badge, History event); v1.62.3 fixes the underlying cause.
PR #424.
Root cause
golang.org/x/oauth2 v0.36.0/internal/token.go:202 wraps client_id and client_secret in url.QueryEscape() before building the Basic auth header on the refresh path:
req.SetBasicAuth(url.QueryEscape(clientID), url.QueryEscape(clientSecret))That follows RFC 6749 §2.3.1 to the letter. Many production IdPs (Salesforce, GitHub, and similar OAuth 2.0 token endpoints) store the raw client_secret value server-side and compare against the raw secret in the inbound Basic auth header. A URL-encoded secret in the header is literally a different password from what the IdP has on file, and the IdP returns 400 invalid_client.
A client_secret containing any character URL-encoding rewrites (+, /, = mid-string, space, etc.) hits this directly. Example: secret +8r5xs3YLFC/LHK= becomes %2B8r5xs3YLFC%2FLHK%3D on the wire. Not the secret the IdP has on file.
The symptom-pair the bug explained
| Operation | Code path | URL-encodes? | Behavior |
|---|---|---|---|
| Initial code-for-tokens exchange | Our custom code at pkg/connoauth/exchange.go:139 |
No | Works |
| First refresh after Connect | golang.org/x/oauth2.Config.TokenSource |
Yes | Returns 400 invalid_client |
| Reference Python proxy refresh against the same kind of IdP | requests.HTTPBasicAuth |
No | Works for years |
The initial Connect always succeeded (the code-exchange path uses plain SetBasicAuth, no encoding). The refresh attempted roughly 55 minutes later always failed (the library wraps the secret with url.QueryEscape). The connection's first refresh tick after every Connect killed it. Reconnect would issue a fresh token, the fresh token would refresh-fail an hour later, repeat.
Fix
New file pkg/connoauth/refresh.go with Refresh / buildRefreshRequest / postRefresh / decodeRefreshResponse, mirroring the working Exchange code path:
// pkg/connoauth/refresh.go:124-127
if in.Config.EndpointAuthStyle != oauth2.AuthStyleInParams {
// Plain SetBasicAuth: no URL-encoding of the credentials.
req.SetBasicAuth(in.Config.ClientID, in.Config.ClientSecret)
}Source.refresh now calls this instead of oauth2.Config.TokenSource. The body-encoded auth style (AuthStyleInParams) is supported and unaffected; url.Values.Encode() is standard form encoding that the IdP URL-decodes server-side back to the raw secret, so that round-trip is correct.
postRefresh translates non-2xx responses into *oauth2.RetrieveError so the downstream classifyRefreshError / idpErrorCodeOf / isTerminalIDPRejection pipeline introduced in v1.62.2 keeps working unchanged.
Dead-code cleanup made possible by the refactor
Source.clientfield and its initializer inNewSource: was only consumed by the now-deletedcontext.WithValue(ctx, oauth2.HTTPClient, s.client)call in the old refresh path. Removed.refreshDeadlineFromToken: only consumed by the two persist / emit sites in the old refresh, both now readRefreshResult.RefreshExpiresAtdirectly. Removed (function and its test).Config.oauth2Confighelper: zero remaining callers. Removed.
persistRefreshed now takes *RefreshResult instead of *oauth2.Token.
Regression tests
| Test | Pins |
|---|---|
TestRefresh_BasicAuth_SendsRawSecret |
Stands up a fake IdP, sends the literal +8r5xs3YLFC/LHK= secret, asserts the Authorization header carries the base64 of client:+8r5xs3YLFC/LHK= and NOT the URL-encoded form. This is the test whose absence let the bug ship. |
TestRefresh_AuthStyleInParams_SecretInBody |
Body-encoded path still delivers the raw secret via form encoding round-trip. |
TestRefresh_IdPError_Returns_RetrieveError |
4xx responses are wrapped as *oauth2.RetrieveError so the existing classify pipeline keeps working. |
TestRefresh_PersistsRotatedToken |
Rotated refresh_token plus refresh_expires_in flow through into RefreshResult. |
TestRefresh_ValidateInput |
Malformed input rejected before any network call. |
Why every prior gate missed this
- The library is RFC-correct.
make verify,golangci-lint,gosec, CodeQL, and semgrep all see a stdlib-style oauth2 call that follows the spec. None of them know that real IdPs reject what the spec says. - The exchange-vs-refresh asymmetry was invisible to single-file review. The exchange path is in
exchange.go(custom, works). The refresh path was a delegating one-liner to a third-party library (broken). Comparing them required noticing they were the same problem shape with different implementations. - v1.62.2 widened the terminal-error classifier so the failure became visible (red
reauthbadge,Token row deletedevent) instead of silently retrying. That made the badge red but did not fix the underlying refresh failure. v1.62.3 fixes the underlying failure. - A reference Python proxy that worked for years on the same kind of IdP was available the whole time. Comparing it earlier would have surfaced this. Lesson recorded: when our code disagrees with a working reference, compare wire shape byte-for-byte before theorizing.
Adversarial pre-commit review
Three rounds of adversarial review before commit:
- Round 1: 2 findings (dead code from the refactor:
Source.client,refreshDeadlineFromToken). Both deleted. - Round 2: 3 findings (vendor-specific brand names in doc comments). All scrubbed to generic examples.
- Round 3: 2 findings (
gosec G120: parsing form data withouthttp.MaxBytesReaderin the new test). Fixed by wrappingr.BodyinMaxBytesReaderbeforeParseFormand switching toPostFormValue. Notable: the rawgosecbinary did not report these issues, butgolangci-lint run --new-from-rev=$MERGE_BASEdid. The gate caught what local verification missed. - Round 4: CLEAN.
make verify green locally (full suite: test/race, coverage, patch coverage, golangci-lint, gosec, govulncheck, semgrep, codeql, dead-code, mutation, GoReleaser dry-run).
Upgrade notes
- No schema change, no config change, no breaking API change. Drop-in upgrade from v1.62.2.
- Connections stuck on
needs_reauthbecause their first refresh post-v1.62.2 hitinvalid_client: after rolling to v1.62.3, click Reconnect once. The new token's refresh tick at ~55 minutes will succeed instead of failing, and the connection stays alive across the normal refresh cycle. - Connections that worked yesterday but stopped working today with
invalid_client: the refresh-time URL-encoding mismatch was likely always present in the library path, but became visible when the operator's storedclient_secretvalue happened to contain+,/,=, or another URL-significant character. After rolling to v1.62.3, those connections refresh successfully again. - On-deploy diagnostic:
Refresh succeededevents should appear in the connection's History panel within roughly 55 minutes of each Connect, instead of the priorRefresh rejected by IdP (invalid_client)followed byToken row deleted. The redreauthbadge from v1.62.2 should not reappear after a fresh Connect.
Installation
Homebrew (macOS)
brew install txn2/tap/mcp-data-platformClaude Code CLI
claude mcp add mcp-data-platform -- mcp-data-platformDocker
docker pull ghcr.io/txn2/mcp-data-platform:v1.62.3Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-data-platform_1.62.3_linux_amd64.tar.gz.sigstore.json \
mcp-data-platform_1.62.3_linux_amd64.tar.gz