Skip to content

No retry logic for transient failures in services #73

@user1303836

Description

@user1303836

Problem

Most services lack retry logic for transient failures. When external services have temporary issues (network hiccups, rate limits, 5xx errors), operations fail permanently instead of retrying.

Affected Services:

Service Issue
content_extractor.py No retry on network errors
page_analyzer.py Retries RateLimitError only, not timeouts or 5xx
content_poster.py No retry on Discord rate limits or temporary unavailability
message_forwarder.py No retry on Discord API hiccups
summarizer.py Retries RateLimitError but not other transient API errors

Impact

  • Temporary Discord outages cause messages to never be posted
  • Network hiccups cause content to be skipped permanently
  • Items that fail once are never retried (content_poster marks as "attempted" but item remains unposted)
  • Users lose content due to transient issues

Proposed Solution

  1. Add retry decorators with exponential backoff for transient errors:
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

@retry(
    retry=retry_if_exception_type((httpx.TimeoutException, httpx.HTTPStatusError)),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    stop=stop_after_attempt(3),
)
async def fetch_content(self, url: str) -> str:
    # ...
  1. Distinguish between transient errors (retry) and permanent errors (fail immediately):

    • Transient: 429, 500-599, timeouts, connection errors
    • Permanent: 400, 401, 403, 404
  2. Implement a "dead letter queue" for items that fail after max retries

Labels

enhancement, reliability

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions