fix: add 429 rate limit retry backoff to OpenAI embedder and VLM#568
fix: add 429 rate limit retry backoff to OpenAI embedder and VLM#568Dilan-TH wants to merge 5 commits intovolcengine:mainfrom
Conversation
|
|
| last_error = e | ||
| if attempt < max_retries: | ||
| await asyncio.sleep(2**attempt) | ||
| except openai.RateLimitError: |
There was a problem hiding this comment.
[Bug] get_completion_async 的 max_retries 参数被静默忽略。原有实现对所有异常类型进行重试(使用指数退避 2**attempt),新代码仅针对 RateLimitError 重试且完全不使用 max_retries。
如果有调用方传入 max_retries > 0 期望获得通用重试,这是一个行为回退。建议保留原有的通用重试逻辑,并在此基础上添加 rate limit 专用处理:
async def get_completion_async(
self, prompt: str, thinking: bool = False, max_retries: int = 0
) -> str:
last_error = None
for attempt in range(max(max_retries + 1, len(self._RATE_LIMIT_RETRY_DELAYS) + 1)):
try:
response = await client.chat.completions.create(**kwargs)
self._update_token_usage_from_response(response)
return response.choices[0].message.content or ""
except openai.RateLimitError as e:
last_error = e
delay = self._RATE_LIMIT_RETRY_DELAYS[min(attempt, len(self._RATE_LIMIT_RETRY_DELAYS) - 1)]
logger.warning(f"VLM rate limited (429), retrying in {delay}s...")
await asyncio.sleep(delay)
except Exception as e:
last_error = e
if attempt < max_retries:
await asyncio.sleep(2 ** attempt)
else:
break
raise last_error|
|
||
| def get_completion(self, prompt: str, thinking: bool = False) -> str: | ||
| """Get text completion""" | ||
| import openai |
There was a problem hiding this comment.
[Suggestion] 这里新增的裸 import openai 会在 openai 未安装时抛出原始 ImportError,绕过了 get_client() 中带有友好提示信息的 ImportError("Please install openai: pip install openai")。
建议将 import openai 移到 get_client() 调用之后,或统一使用 try/except 提供一致的错误提示。get_completion_async 中(line 93)同理。
| return | ||
|
|
||
| # Retry delays (seconds) on 429 RateLimitError before each successive attempt | ||
| _RATE_LIMIT_RETRY_DELAYS = [10, 30, 60] |
There was a problem hiding this comment.
[Suggestion] get_vision_completion 和 get_vision_completion_async 同样会遇到 429 rate limit,但未加入重试逻辑。建议提取一个通用的 retry helper(类似 embedder 中的 _call_with_retry),统一应用到所有 completion 方法,避免 vision 场景在大量图片索引时仍因 429 而失败。
Problem
When indexing repositories with many files, the OpenAI embedder and VLM backend immediately fail on
429 RateLimitErrorresponses. These failures are permanent — the embedding queue marks the item as errored rather than retrying — resulting in partially-indexed repositories and an unhealthy queue.A secondary issue: the embedder sends full file content to the embeddings API without truncation. For large files this exceeds the model's context window (e.g.
text-embedding-3-largehas an 8192 token limit), causing400 ContextWindowExceededError.Changes
openviking/models/embedder/openai_embedders.py_call_with_retry()helper that catchesopenai.RateLimitErrorand retries with increasing delays (10s → 30s → 60s) before a final attemptembed()andembed_batch()_MAX_CHARS = 30000truncation before sending text to the API (~4 chars/token, keeping well under the 8192-token limit)openviking/models/vlm/backends/openai_vlm.py_RATE_LIMIT_RETRY_DELAYSretry pattern toget_completion()andget_completion_async()using the same10s → 30s → 60sbackoffget_completion_async()usesawait asyncio.sleep()to avoid blocking the event loop during retriesWhy this approach
The rate limit window on OpenAI-compatible APIs resets every 60 seconds. A fixed delay sequence (
10s → 30s → 60s) covers burst scenarios (first retry at 10s) while ensuring the final retry always lands after a full window reset. This is intentionally simple — no jitter, no exponential growth — appropriate for a background indexing workload where throughput matters more than precise timing.