fix(social): distinguish platform-down from token-expired#49
Merged
paulocastellano merged 14 commits intoMay 19, 2026
Conversation
When a provider's API was down (5xx, timeout, DNS), the hourly RefreshSocialToken job and daily VerifyWorkspaceConnections job were treating it as "token revoked" and emailing the user to reconnect. Bluesky going offline triggered false-positive disconnect notifications because Bluesky access tokens are short-lived (2h) so every hourly refresh failed during the outage. - New PlatformUnavailableException: API unreachable / 5xx, transient. TokenExpiredException stays for 4xx (token is provably bad). - New TokenRefreshClient: normalizes failure semantics for OAuth refresh HTTP calls across all providers. Takes a Platform enum so typos fail at compile time and the user-facing label comes from one source. - ConnectionVerifier: all 8 refresh*Token methods route through the new client. Hardcoded OAuth URLs (LinkedIn, YouTube) and Bluesky's default PDS host moved into config/trypost.php alongside the existing per-platform entries. - RefreshSocialToken job: PlatformUnavailableException → log warning and stop. Do NOT markAsTokenExpired, do NOT notify the user. Next scheduled tick retries. - VerifyWorkspaceConnections job: PlatformUnavailableException from the inner refresh propagates and is treated as a transient skip.
Review follow-ups: - verifyMastodon was the last hardcoded host left after the PR moved LinkedIn/YouTube/Bluesky to config. Adds trypost.platforms.mastodon .default_instance (env MASTODON_DEFAULT_INSTANCE) and reads from it. - refreshToken() docblock now declares @throws PlatformUnavailableException (the whole point of the PR was missing from its contract). - Strip the new explanatory comments inside catch blocks and tests — rationale lives in the commit / PR, not inline. The two comments inside empty `catch (TokenExpiredException) {}` blocks stay because there the comment is the only thing telling the reader why the exception is swallowed.
Earlier in the PR the new configs (linkedin.oauth_api, youtube.oauth_api, bluesky.default_service, mastodon.default_instance) were only read by ConnectionVerifier. The same URLs were still hardcoded in the publishers, analytics and the Bluesky auth controller — meaning a self-hosted user setting BLUESKY_DEFAULT_SERVICE or MASTODON_DEFAULT_INSTANCE in env would get split behavior: refresh/verify honor the override, publish/analytics don't. Routes all 10 remaining call sites through the same config values so the overrides actually work end-to-end.
LinkedInPageAnalytics and MastodonAnalytics were the only two of the 11 files refactored to read OAuth host / default instance from config that had zero test coverage. Adds smoke tests that assert the HTTP request hits the configured URL, so a typo in the config key (e.g. linkedin.api vs linkedin.oauth_api) would now fail loudly.
The previous commits in this PR closed the loophole on the hourly / daily token-refresh jobs. The same loophole remained on the publish path: every per-platform publisher (LinkedIn, X, YouTube, TikTok, Threads, Instagram, Pinterest, Bluesky and their Analytics siblings) has its own refreshToken() called before publishing a scheduled post, and all of those treated any non-2xx as TokenExpired — including 5xx. Result before this commit: a Bluesky outage that coincided with a scheduled publish would mark the account as expired and fail the post. Changes: - Route every refreshToken() in the 16 publisher / analytics classes through TokenRefreshClient::for(Platform::X)->send(...). - TokenRefreshClient now also fills platformErrorCode from the HTTP status and pulls error_description / error.message from the JSON body, preserving the richer info LinkedIn / X / TikTok / Pinterest / Threads used to put on their TokenExpiredException. - PublishToSocialPlatform catches PlatformUnavailableException explicitly: the post is marked failed (category: platform_unavailable, with http_status in error_context) but the account stays Connected. No retry inside this job — the scheduler reattempts the next run. Test added: publish flow does NOT mark account expired when the publisher throws PlatformUnavailable. Full suite: 1569 passing.
Every per-platform publisher and analytics class had its own private refreshToken() implementation, all doing essentially the same thing as ConnectionVerifier's per-platform refresh*Token methods. ~17 copies of near-identical OAuth refresh logic across the codebase, which is how the 5xx→TokenExpired bug got planted in the publish path even after we fixed it on the hourly/daily jobs. Consolidation: - All publish/analytics paths call app(ConnectionVerifier::class) ->refreshToken($account) instead of their own implementation. - 17 private refreshToken() methods deleted. - refreshTokenWithLock helper removed from HasSocialHttpClient trait (its lock semantics are duplicated by ConnectionVerifier's per- account lock). - ConnectionVerifier::refreshLinkedInToken passes $account->platform to TokenRefreshClient so the "LinkedIn" vs "LinkedIn Page" label in error messages is preserved. - TokenRefreshClient now treats HTTP 429 the same as 5xx (raises PlatformUnavailableException) — replaces the retry-on-429 behavior that socialHttp() provided to the deleted refresh methods. Net: -460 LOC. Single per-platform refresh implementation. Every fix or new platform now lands in exactly one place.
Same shape of bug as the refreshToken duplication: the regex that strips access_token / Bearer headers from logged HTTP bodies existed in three near-identical copies (HasSocialHttpClient trait, TokenRefreshClient, SocialPublishException), drifting subtly — SocialPublishException was missing the JSON "token" pattern. Extracts a TokenRedactor::redact(string) helper and routes all callers through it. Adding a new token format now means one regex in one file.
Lets the call sites drop the explicit null check ternary. The redactor itself owns the null handling — one less conditional at every caller.
- TokenRedactorTest (7 unit tests): all four regex patterns, multiple secrets in one body, null input, and the no-op pass-through case. Guards against silent regression of the redaction regexes (which previously drifted across three duplicated copies). - ConnectionVerifierTest: HTTP 429 during refresh raises PlatformUnavailableException, not TokenExpiredException. Locks in the rate-limit-as-transient behavior added when consolidating the refresh logic.
Same rule applied to production code in earlier commits now applies to
tests: Http::fake patterns and assertions read from
config('trypost.platforms.*.oauth_api' / '.api' / '.default_service')
instead of hardcoded strings. Hardcoded URLs in tests drift silently
when the config changes.
Also documents the rule in CLAUDE.md under "External Service URLs" so
new code (and tests) start in the right place — only the host comes
from config, path/RPC segments stay inline next to the call.
Edge case from the prior commits: if the publisher throws TokenExpired (401 path), PublishToSocialPlatform attempts refreshAccountToken() to recover. That internally goes through ConnectionVerifier::verify, which can now raise PlatformUnavailable (5xx). The old catch (\Throwable) swallowed it but the loop still fell through to markAsTokenExpired — meaning a transient platform outage during a retry could still flip the account to expired. Adds an explicit PlatformUnavailable catch in the retry block: marks the post failed with category platform_unavailable and breaks before touching the account status.
Every refresh*Token method inside ConnectionVerifier already finishes with \$account->update(...) (mutates the model) + \$account->refresh() (reloads from DB to pick up sibling updates like LinkedInTokenSynchronizer). The lock-contention branch in refreshToken() also calls \$account->refresh() before returning. So a second \$account->refresh() in the caller was always a redundant SELECT — no scenario where it actually pulled a different value than what CV already left in memory. Removed from all 15 call sites, plus fixed the duplicated "Mastodon tokens don't expire" comment.
…iling Before: a scheduled post hitting a platform outage was marked Failed — user had to manually retry. Now the job reschedules itself for 10 minutes later and the PostPlatform shows status "Retrying". Loops indefinitely until the platform accepts the post. - Adds PostPlatformStatus::Retrying (existing string column, no migration) - PublishToSocialPlatform: PlatformUnavailable catch now calls rescheduleForRetry() which (a) updates the row to Retrying with retry_count + next_attempt_at in error_context, and (b) dispatches itself with a 10-minute delay. updatePostStatus() naturally leaves the parent Post in Publishing because Retrying is neither Published nor Failed. - Same treatment for the retry-refresh edge case (publisher throws TokenExpired, refresh subsequently fails with PlatformUnavailable). - i18n + frontend status config updated (en, pt-BR, es) for both posts.status.retrying and posts.edit.status.retrying. - Tests: 3 new tests covering the dispatched job, the edge case path, and retry_count increment across attempts.
Audit found three behaviors with no explicit assertion: - Job is dispatched with a 10-minute delay (would silently regress if the duration changed). Uses Carbon::setTestNow + Bus::assertDispatched inspecting \$job->delay. - error_context.last_attempt_at is recorded at the moment of failure. - updatePostStatus does NOT finalize the parent Post while any of its platforms is in Retrying — covers the central invariant of the feature (the post must stay Publishing until every platform lands in Published or Failed). - A platform currently in Retrying transitions to Published when the next attempt succeeds — proves the loop terminates.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a social provider's API was down (5xx, timeout, DNS), the hourly
RefreshSocialTokenjob and the dailyVerifyWorkspaceConnectionsjob were treating the failure as "token revoked" — marking accounts as expired and emailing the user to reconnect. Bluesky going offline triggered false-positive disconnect notifications because Bluesky access tokens are short-lived (2h), so every hourly refresh failed during the outage.The fix introduces a separate exception for transient platform-down failures and routes all OAuth refresh HTTP calls through a small, typed client.
Changes
PlatformUnavailableException— new. Raised forConnectionException, 5xx. Means "API is unreachable, retry later".TokenExpiredException— unchanged. Reserved for 4xx (token is provably bad).TokenRefreshClient— new. Normalizes failure semantics for OAuth refresh calls across all 8 providers. Takes aPlatformenum so typos are compile-time errors and the user-facing label comes from a single source (Platform::label()).ConnectionVerifier— allrefresh*Tokenmethods route through the new client. Hardcoded OAuth URLs (LinkedIn `accessToken`, YouTube `oauth2.googleapis.com`, Bluesky default PDS) moved into `config/trypost.php` alongside the existing per-platform entries.RefreshSocialTokenjob — `PlatformUnavailableException` → log warning and stop. Does not `markAsTokenExpired`, does not notify. Next scheduled tick retries.VerifyWorkspaceConnectionsjob — `PlatformUnavailableException` from the inner refresh is treated as a transient skip (no disconnect, no email).Why a class and not just a helper / macro
Considered three approaches:
ConnectionVerifier(the first cut).Http::macro('forTokenRefresh', ...)with->throw($callback).TokenRefreshClientclass.The macro approach (2) is clean for mapping 4xx/5xx via the throw callback, but doesn't intercept
ConnectionException— that's raised by Guzzle before the response exists, so every caller would still need its owntry/catch. The class encapsulates both code paths insend(), callers stay at one level of nesting, and the class is independently testable.Test plan
🤖 Generated with Claude Code