fix: improve retry logic, error metadata, and container lifecycle#17
Merged
tirthpatell merged 12 commits intomainfrom Mar 13, 2026
Merged
fix: improve retry logic, error metadata, and container lifecycle#17tirthpatell merged 12 commits intomainfrom
tirthpatell merged 12 commits intomainfrom
Conversation
Parse error_subcode from Meta API error responses in both createErrorFromResponse and handleAPIError. Add IsTransientError() helper function for checking transient errors from any error type.
shouldRetryStatus in Do() was unreachable because executeRequest already returns an error for HTTP status >= 400, so the status code at that point was always < 400. The retry logic for 429/5xx errors is correctly handled via isRetryableError on the error path.
- Extract setErrorMetadata() to DRY up repeated extractBaseError pattern - Collapse 401/403 into single switch case - Fold redundant 500-504 case into default (both create APIError) - Remove redundant InProgress/Published case in waitForContainerReady
Greptile SummaryThis PR fixes a set of closely related reliability bugs in the Threads API client — incorrect retry decisions, missing error metadata, a nil-pointer on 429 responses, and unresponsive context cancellation during container polling — and introduces parallel child-container readiness checks to reduce carousel creation latency. Key changes:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[HTTPClient.Do] --> B{isRetryableError}
B -->|RateLimitError| C[retry with backoff]
B -->|NetworkError| D{Temporary or IsTransient?}
D -->|yes| C
D -->|no| E[return error immediately]
B -->|other typed error| F{extractBaseError}
F --> G{IsTransient?}
G -->|yes| C
G -->|no| H{HTTPStatusCode 5xx?}
H -->|yes| C
H -->|no| E
C --> I{maxRetries exhausted?}
I -->|no| A
I -->|yes| J[return wrapped lastErr]
K[createErrorFromResponse] --> L{HTTP status}
L -->|401 or 403| M[AuthenticationError]
L -->|429| N[RateLimitError + MarkRateLimited]
L -->|400 or 422| O[ValidationError]
L -->|default| P[APIError]
M --> Q[setErrorMetadata]
N --> Q
O --> Q
P --> Q
Q --> R[IsTransient, HTTPStatusCode, ErrorSubcode set on BaseError]
S[CreateCarouselPost] --> T[spawn N goroutines via childCtx]
T --> U[waitForContainerReady with ctx.Done select]
U --> V[buffered results channel]
V --> W[collect all N results, cancelChildren on first error]
W --> X{any non-ctx-cancel error?}
X -->|yes| Y[return lowest-index real error]
X -->|no| Z{any ctx-cancel error?}
Z -->|yes| AA[return first cancel error]
Z -->|no| AB[proceed to carousel container creation]
Last reviewed commit: 18cf6c4 |
- IsTransientError now uses errors.As to support wrapped errors, consistent with all other IsXxx helpers - Add context.WithCancel in carousel parallel polling to prevent goroutine leaks on early return - Guard resp.RateLimit nil dereference on 429 in handleAPIError - Rewrite context cancellation test with httptest server returning IN_PROGRESS so the select branch is actually exercised - Add wrapped error test case for IsTransientError
Collect all child container results before reporting, so the lowest-indexed failure is always surfaced first — matching the behavior of the prior sequential implementation.
Call cancelChildren() inside the collection loop when an error is detected, so remaining goroutines stop polling at their next ctx.Done() check instead of running to exhaustion.
When one child container fails and cancelChildren() fires, siblings receive context.Canceled. The error reporting loop now skips these cancellation artifacts and surfaces the real root-cause failure first.
Owner
Author
- isRetryableError now checks netErr.IsTransient alongside Temporary, so a non-temporary but transient NetworkError is correctly retried - Carousel polling calls cancelChildren() as soon as any failure is received, stopping siblings immediately instead of at function return
wrapNetworkError now stores the original error as Cause, so errors.Is(err, context.Canceled) works through NetworkError wrappers. This fixes the carousel sibling-error filter which relies on errors.Is to distinguish real failures from cancellation side-effects.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR improves the retry/error handling infrastructure and container lifecycle management in the Threads API client:
is_transientanderror_subcodefrom Meta API error responses — all typed errors (APIError,ValidationError,AuthenticationError, etc.) now carryIsTransient,HTTPStatusCode, andErrorSubcodefields on the sharedBaseError, enabling richer error inspection by callers.isRetryableErrorcomparedapiErr.Code(the Meta error code, e.g.2) against the 500-599 range, which never matched. Now retry logic checksbaseErr.HTTPStatusCodeand theIsTransientflag, so 5xx responses and transient errors are correctly retried.shouldRetryStatuscode path —executeRequestalready returns an error for HTTP >= 400, so the status-code check inDo()was unreachable. Retry for 429/5xx is now handled viaisRetryableErroron the error path.IsTransientError()public helper — follows the existingIsAuthenticationError/IsRateLimitErrorpattern, letting callers check if an error is transient without type-switching.waitForContainerReady— replaced baretime.Sleepwithselectonctx.Done()/time.After, so timeouts and cancellation propagate correctly.setErrorMetadatahelper and simplify switch cases — DRYs up the repeatedextractBaseError+ field-setting pattern, merges identical 401/403 and 5xx/default switch arms, and removes the redundant InProgress/Published case inwaitForContainerReady.Test plan
TestCreateErrorFromResponseParsesIsTransient— verifiesis_transientis parsed from JSONTestCreateErrorFromResponseParsesErrorSubcode— verifieserror_subcodeis parsedTestIsRetryableErrorWithTransientAPIError— validates retry for transient, 5xx, and non-retryable errorsTestIsTransientErrorHelper— validates the publicIsTransientError()helperTestErrorIsTransient/TestErrorSubcode— field-level checks on error typesTestWaitForContainerReadyRespectsContext— confirms context timeout is honoredgo test ./...)