Skip to content

fix: add 429 rate limit retry backoff to OpenAI embedder and VLM#568

Closed
Dilan-TH wants to merge 5 commits intovolcengine:mainfrom
Dilan-TH:fix/openai-rate-limit-retry
Closed

fix: add 429 rate limit retry backoff to OpenAI embedder and VLM#568
Dilan-TH wants to merge 5 commits intovolcengine:mainfrom
Dilan-TH:fix/openai-rate-limit-retry

Conversation

@Dilan-TH
Copy link
Copy Markdown

Problem

When indexing repositories with many files, the OpenAI embedder and VLM backend immediately fail on 429 RateLimitError responses. These failures are permanent — the embedding queue marks the item as errored rather than retrying — resulting in partially-indexed repositories and an unhealthy queue.

A secondary issue: the embedder sends full file content to the embeddings API without truncation. For large files this exceeds the model's context window (e.g. text-embedding-3-large has an 8192 token limit), causing 400 ContextWindowExceededError.

Changes

openviking/models/embedder/openai_embedders.py

  • Added _call_with_retry() helper that catches openai.RateLimitError and retries with increasing delays (10s → 30s → 60s) before a final attempt
  • Applied retry logic to both embed() and embed_batch()
  • Added _MAX_CHARS = 30000 truncation before sending text to the API (~4 chars/token, keeping well under the 8192-token limit)

openviking/models/vlm/backends/openai_vlm.py

  • Added _RATE_LIMIT_RETRY_DELAYS retry pattern to get_completion() and get_completion_async() using the same 10s → 30s → 60s backoff
  • get_completion_async() uses await asyncio.sleep() to avoid blocking the event loop during retries

Why this approach

The rate limit window on OpenAI-compatible APIs resets every 60 seconds. A fixed delay sequence (10s → 30s → 60s) covers burst scenarios (first retry at 10s) while ensuring the final retry always lands after a full window reset. This is intentionally simple — no jitter, no exponential growth — appropriate for a background indexing workload where throughput matters more than precise timing.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

last_error = e
if attempt < max_retries:
await asyncio.sleep(2**attempt)
except openai.RateLimitError:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] get_completion_asyncmax_retries 参数被静默忽略。原有实现对所有异常类型进行重试(使用指数退避 2**attempt),新代码仅针对 RateLimitError 重试且完全不使用 max_retries

如果有调用方传入 max_retries > 0 期望获得通用重试,这是一个行为回退。建议保留原有的通用重试逻辑,并在此基础上添加 rate limit 专用处理:

async def get_completion_async(
    self, prompt: str, thinking: bool = False, max_retries: int = 0
) -> str:
    last_error = None
    for attempt in range(max(max_retries + 1, len(self._RATE_LIMIT_RETRY_DELAYS) + 1)):
        try:
            response = await client.chat.completions.create(**kwargs)
            self._update_token_usage_from_response(response)
            return response.choices[0].message.content or ""
        except openai.RateLimitError as e:
            last_error = e
            delay = self._RATE_LIMIT_RETRY_DELAYS[min(attempt, len(self._RATE_LIMIT_RETRY_DELAYS) - 1)]
            logger.warning(f"VLM rate limited (429), retrying in {delay}s...")
            await asyncio.sleep(delay)
        except Exception as e:
            last_error = e
            if attempt < max_retries:
                await asyncio.sleep(2 ** attempt)
            else:
                break
    raise last_error


def get_completion(self, prompt: str, thinking: bool = False) -> str:
"""Get text completion"""
import openai
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] 这里新增的裸 import openai 会在 openai 未安装时抛出原始 ImportError,绕过了 get_client() 中带有友好提示信息的 ImportError("Please install openai: pip install openai")。

建议将 import openai 移到 get_client() 调用之后,或统一使用 try/except 提供一致的错误提示。get_completion_async 中(line 93)同理。

return

# Retry delays (seconds) on 429 RateLimitError before each successive attempt
_RATE_LIMIT_RETRY_DELAYS = [10, 30, 60]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] get_vision_completionget_vision_completion_async 同样会遇到 429 rate limit,但未加入重试逻辑。建议提取一个通用的 retry helper(类似 embedder 中的 _call_with_retry),统一应用到所有 completion 方法,避免 vision 场景在大量图片索引时仍因 429 而失败。

@Dilan-TH Dilan-TH closed this Mar 13, 2026
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 13, 2026
@Dilan-TH Dilan-TH deleted the fix/openai-rate-limit-retry branch March 13, 2026 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants