fix(social): distinguish platform-down from token-expired by paulocastellano · Pull Request #49 · trypostit/trypost

paulocastellano · 2026-05-19T11:17:10Z

Summary

When a social provider's API was down (5xx, timeout, DNS), the hourly RefreshSocialToken job and the daily VerifyWorkspaceConnections job were treating the failure as "token revoked" — marking accounts as expired and emailing the user to reconnect. Bluesky going offline triggered false-positive disconnect notifications because Bluesky access tokens are short-lived (2h), so every hourly refresh failed during the outage.

The fix introduces a separate exception for transient platform-down failures and routes all OAuth refresh HTTP calls through a small, typed client.

Changes

PlatformUnavailableException — new. Raised for ConnectionException, 5xx. Means "API is unreachable, retry later".
TokenExpiredException — unchanged. Reserved for 4xx (token is provably bad).
TokenRefreshClient — new. Normalizes failure semantics for OAuth refresh calls across all 8 providers. Takes a Platform enum so typos are compile-time errors and the user-facing label comes from a single source (Platform::label()).
ConnectionVerifier — all refresh*Token methods route through the new client. Hardcoded OAuth URLs (LinkedIn `accessToken`, YouTube `oauth2.googleapis.com`, Bluesky default PDS) moved into `config/trypost.php` alongside the existing per-platform entries.
RefreshSocialToken job — `PlatformUnavailableException` → log warning and stop. Does not `markAsTokenExpired`, does not notify. Next scheduled tick retries.
VerifyWorkspaceConnections job — `PlatformUnavailableException` from the inner refresh is treated as a transient skip (no disconnect, no email).

Why a class and not just a helper / macro

Considered three approaches:

Private helper method on ConnectionVerifier (the first cut).
Http::macro('forTokenRefresh', ...) with ->throw($callback).
Dedicated TokenRefreshClient class.

The macro approach (2) is clean for mapping 4xx/5xx via the throw callback, but doesn't intercept ConnectionException — that's raised by Guzzle before the response exists, so every caller would still need its own try/catch. The class encapsulates both code paths in send(), callers stay at one level of nesting, and the class is independently testable.

Test plan

`php artisan test --compact --parallel` — full suite 1565 passing
Bluesky 5xx during refresh raises `PlatformUnavailable`, not `TokenExpired` (incl. when password fallback is stored)
LinkedIn / X / YouTube / Pinterest / TikTok / Threads / Instagram path: 5xx → unavailable, 4xx → expired, ConnectionException → unavailable
`RefreshSocialToken` job: PlatformUnavailable leaves status `Connected`, no `SendNotification` dispatched
`VerifyWorkspaceConnections` job: PlatformUnavailable leaves status `Connected`, no email queued
Manual: stop Bluesky in fake/test mode for >2h and confirm no reconnect email is sent

🤖 Generated with Claude Code

When a provider's API was down (5xx, timeout, DNS), the hourly RefreshSocialToken job and daily VerifyWorkspaceConnections job were treating it as "token revoked" and emailing the user to reconnect. Bluesky going offline triggered false-positive disconnect notifications because Bluesky access tokens are short-lived (2h) so every hourly refresh failed during the outage. - New PlatformUnavailableException: API unreachable / 5xx, transient. TokenExpiredException stays for 4xx (token is provably bad). - New TokenRefreshClient: normalizes failure semantics for OAuth refresh HTTP calls across all providers. Takes a Platform enum so typos fail at compile time and the user-facing label comes from one source. - ConnectionVerifier: all 8 refresh*Token methods route through the new client. Hardcoded OAuth URLs (LinkedIn, YouTube) and Bluesky's default PDS host moved into config/trypost.php alongside the existing per-platform entries. - RefreshSocialToken job: PlatformUnavailableException → log warning and stop. Do NOT markAsTokenExpired, do NOT notify the user. Next scheduled tick retries. - VerifyWorkspaceConnections job: PlatformUnavailableException from the inner refresh propagates and is treated as a transient skip.

@throws

Review follow-ups: - verifyMastodon was the last hardcoded host left after the PR moved LinkedIn/YouTube/Bluesky to config. Adds trypost.platforms.mastodon .default_instance (env MASTODON_DEFAULT_INSTANCE) and reads from it. - refreshToken() docblock now declares @throws PlatformUnavailableException (the whole point of the PR was missing from its contract). - Strip the new explanatory comments inside catch blocks and tests — rationale lives in the commit / PR, not inline. The two comments inside empty `catch (TokenExpiredException) {}` blocks stay because there the comment is the only thing telling the reader why the exception is swallowed.

Earlier in the PR the new configs (linkedin.oauth_api, youtube.oauth_api, bluesky.default_service, mastodon.default_instance) were only read by ConnectionVerifier. The same URLs were still hardcoded in the publishers, analytics and the Bluesky auth controller — meaning a self-hosted user setting BLUESKY_DEFAULT_SERVICE or MASTODON_DEFAULT_INSTANCE in env would get split behavior: refresh/verify honor the override, publish/analytics don't. Routes all 10 remaining call sites through the same config values so the overrides actually work end-to-end.

LinkedInPageAnalytics and MastodonAnalytics were the only two of the 11 files refactored to read OAuth host / default instance from config that had zero test coverage. Adds smoke tests that assert the HTTP request hits the configured URL, so a typo in the config key (e.g. linkedin.api vs linkedin.oauth_api) would now fail loudly.

The previous commits in this PR closed the loophole on the hourly / daily token-refresh jobs. The same loophole remained on the publish path: every per-platform publisher (LinkedIn, X, YouTube, TikTok, Threads, Instagram, Pinterest, Bluesky and their Analytics siblings) has its own refreshToken() called before publishing a scheduled post, and all of those treated any non-2xx as TokenExpired — including 5xx. Result before this commit: a Bluesky outage that coincided with a scheduled publish would mark the account as expired and fail the post. Changes: - Route every refreshToken() in the 16 publisher / analytics classes through TokenRefreshClient::for(Platform::X)->send(...). - TokenRefreshClient now also fills platformErrorCode from the HTTP status and pulls error_description / error.message from the JSON body, preserving the richer info LinkedIn / X / TikTok / Pinterest / Threads used to put on their TokenExpiredException. - PublishToSocialPlatform catches PlatformUnavailableException explicitly: the post is marked failed (category: platform_unavailable, with http_status in error_context) but the account stays Connected. No retry inside this job — the scheduler reattempts the next run. Test added: publish flow does NOT mark account expired when the publisher throws PlatformUnavailable. Full suite: 1569 passing.

Every per-platform publisher and analytics class had its own private refreshToken() implementation, all doing essentially the same thing as ConnectionVerifier's per-platform refresh*Token methods. ~17 copies of near-identical OAuth refresh logic across the codebase, which is how the 5xx→TokenExpired bug got planted in the publish path even after we fixed it on the hourly/daily jobs. Consolidation: - All publish/analytics paths call app(ConnectionVerifier::class) ->refreshToken($account) instead of their own implementation. - 17 private refreshToken() methods deleted. - refreshTokenWithLock helper removed from HasSocialHttpClient trait (its lock semantics are duplicated by ConnectionVerifier's per- account lock). - ConnectionVerifier::refreshLinkedInToken passes $account->platform to TokenRefreshClient so the "LinkedIn" vs "LinkedIn Page" label in error messages is preserved. - TokenRefreshClient now treats HTTP 429 the same as 5xx (raises PlatformUnavailableException) — replaces the retry-on-429 behavior that socialHttp() provided to the deleted refresh methods. Net: -460 LOC. Single per-platform refresh implementation. Every fix or new platform now lands in exactly one place.

Same shape of bug as the refreshToken duplication: the regex that strips access_token / Bearer headers from logged HTTP bodies existed in three near-identical copies (HasSocialHttpClient trait, TokenRefreshClient, SocialPublishException), drifting subtly — SocialPublishException was missing the JSON "token" pattern. Extracts a TokenRedactor::redact(string) helper and routes all callers through it. Adding a new token format now means one regex in one file.

Lets the call sites drop the explicit null check ternary. The redactor itself owns the null handling — one less conditional at every caller.

- TokenRedactorTest (7 unit tests): all four regex patterns, multiple secrets in one body, null input, and the no-op pass-through case. Guards against silent regression of the redaction regexes (which previously drifted across three duplicated copies). - ConnectionVerifierTest: HTTP 429 during refresh raises PlatformUnavailableException, not TokenExpiredException. Locks in the rate-limit-as-transient behavior added when consolidating the refresh logic.

Same rule applied to production code in earlier commits now applies to tests: Http::fake patterns and assertions read from config('trypost.platforms.*.oauth_api' / '.api' / '.default_service') instead of hardcoded strings. Hardcoded URLs in tests drift silently when the config changes. Also documents the rule in CLAUDE.md under "External Service URLs" so new code (and tests) start in the right place — only the host comes from config, path/RPC segments stay inline next to the call.

Edge case from the prior commits: if the publisher throws TokenExpired (401 path), PublishToSocialPlatform attempts refreshAccountToken() to recover. That internally goes through ConnectionVerifier::verify, which can now raise PlatformUnavailable (5xx). The old catch (\Throwable) swallowed it but the loop still fell through to markAsTokenExpired — meaning a transient platform outage during a retry could still flip the account to expired. Adds an explicit PlatformUnavailable catch in the retry block: marks the post failed with category platform_unavailable and breaks before touching the account status.

Every refresh*Token method inside ConnectionVerifier already finishes with \$account->update(...) (mutates the model) + \$account->refresh() (reloads from DB to pick up sibling updates like LinkedInTokenSynchronizer). The lock-contention branch in refreshToken() also calls \$account->refresh() before returning. So a second \$account->refresh() in the caller was always a redundant SELECT — no scenario where it actually pulled a different value than what CV already left in memory. Removed from all 15 call sites, plus fixed the duplicated "Mastodon tokens don't expire" comment.

…iling Before: a scheduled post hitting a platform outage was marked Failed — user had to manually retry. Now the job reschedules itself for 10 minutes later and the PostPlatform shows status "Retrying". Loops indefinitely until the platform accepts the post. - Adds PostPlatformStatus::Retrying (existing string column, no migration) - PublishToSocialPlatform: PlatformUnavailable catch now calls rescheduleForRetry() which (a) updates the row to Retrying with retry_count + next_attempt_at in error_context, and (b) dispatches itself with a 10-minute delay. updatePostStatus() naturally leaves the parent Post in Publishing because Retrying is neither Published nor Failed. - Same treatment for the retry-refresh edge case (publisher throws TokenExpired, refresh subsequently fails with PlatformUnavailable). - i18n + frontend status config updated (en, pt-BR, es) for both posts.status.retrying and posts.edit.status.retrying. - Tests: 3 new tests covering the dispatched job, the edge case path, and retry_count increment across attempts.

Audit found three behaviors with no explicit assertion: - Job is dispatched with a 10-minute delay (would silently regress if the duration changed). Uses Carbon::setTestNow + Bus::assertDispatched inspecting \$job->delay. - error_context.last_attempt_at is recorded at the moment of failure. - updatePostStatus does NOT finalize the parent Post while any of its platforms is in Retrying — covers the central invariant of the feature (the post must stay Publishing until every platform lands in Published or Failed). - A platform currently in Retrying transitions to Published when the next attempt succeeds — proves the loop terminates.

paulocastellano added 14 commits May 19, 2026 08:16

refactor(social): TokenRedactor::redact accepts nullable

123996b

Lets the call sites drop the explicit null check ternary. The redactor itself owns the null handling — one less conditional at every caller.

paulocastellano merged commit e2caefe into main May 19, 2026
2 checks passed

paulocastellano deleted the fix/platform-unavailable-vs-token-expired branch May 19, 2026 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(social): distinguish platform-down from token-expired#49

fix(social): distinguish platform-down from token-expired#49
paulocastellano merged 14 commits into
mainfrom
fix/platform-unavailable-vs-token-expired

paulocastellano commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

paulocastellano commented May 19, 2026

Summary

Changes

Why a class and not just a helper / macro

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant