fix: add 429 rate limit retry backoff to OpenAI embedder and VLM by Dilan-TH · Pull Request #568 · volcengine/OpenViking

Dilan-TH · 2026-03-13T03:58:03Z

Problem

When indexing repositories with many files, the OpenAI embedder and VLM backend immediately fail on 429 RateLimitError responses. These failures are permanent — the embedding queue marks the item as errored rather than retrying — resulting in partially-indexed repositories and an unhealthy queue.

A secondary issue: the embedder sends full file content to the embeddings API without truncation. For large files this exceeds the model's context window (e.g. text-embedding-3-large has an 8192 token limit), causing 400 ContextWindowExceededError.

Changes

`openviking/models/embedder/openai_embedders.py`

Added _call_with_retry() helper that catches openai.RateLimitError and retries with increasing delays (10s → 30s → 60s) before a final attempt
Applied retry logic to both embed() and embed_batch()
Added _MAX_CHARS = 30000 truncation before sending text to the API (~4 chars/token, keeping well under the 8192-token limit)

`openviking/models/vlm/backends/openai_vlm.py`

Added _RATE_LIMIT_RETRY_DELAYS retry pattern to get_completion() and get_completion_async() using the same 10s → 30s → 60s backoff
get_completion_async() uses await asyncio.sleep() to avoid blocking the event loop during retries

Why this approach

The rate limit window on OpenAI-compatible APIs resets every 60 seconds. A fixed delay sequence (10s → 30s → 60s) covers burst scenarios (first retry at 10s) while ensuring the final retry always lands after a full window reset. This is intentionally simple — no jitter, no exponential growth — appropriate for a background indexing workload where throughput matters more than precise timing.

CLAassistant · 2026-03-13T03:58:11Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

qin-ctx · 2026-03-13T04:10:04Z

openviking/models/vlm/backends/openai_vlm.py

-                last_error = e
-                if attempt < max_retries:
-                    await asyncio.sleep(2**attempt)
+            except openai.RateLimitError:


[Bug] get_completion_async 的 max_retries 参数被静默忽略。原有实现对所有异常类型进行重试（使用指数退避 2**attempt），新代码仅针对 RateLimitError 重试且完全不使用 max_retries。

如果有调用方传入 max_retries > 0 期望获得通用重试，这是一个行为回退。建议保留原有的通用重试逻辑，并在此基础上添加 rate limit 专用处理：

async def get_completion_async( self, prompt: str, thinking: bool = False, max_retries: int = 0 ) -> str: last_error = None for attempt in range(max(max_retries + 1, len(self._RATE_LIMIT_RETRY_DELAYS) + 1)): try: response = await client.chat.completions.create(**kwargs) self._update_token_usage_from_response(response) return response.choices[0].message.content or "" except openai.RateLimitError as e: last_error = e delay = self._RATE_LIMIT_RETRY_DELAYS[min(attempt, len(self._RATE_LIMIT_RETRY_DELAYS) - 1)] logger.warning(f"VLM rate limited (429), retrying in {delay}s...") await asyncio.sleep(delay) except Exception as e: last_error = e if attempt < max_retries: await asyncio.sleep(2 ** attempt) else: break raise last_error

qin-ctx · 2026-03-13T04:10:04Z

openviking/models/vlm/backends/openai_vlm.py

+
    def get_completion(self, prompt: str, thinking: bool = False) -> str:
        """Get text completion"""
+        import openai


[Suggestion] 这里新增的裸 import openai 会在 openai 未安装时抛出原始 ImportError，绕过了 get_client() 中带有友好提示信息的 ImportError（"Please install openai: pip install openai"）。

建议将 import openai 移到 get_client() 调用之后，或统一使用 try/except 提供一致的错误提示。get_completion_async 中（line 93）同理。

qin-ctx · 2026-03-13T04:10:04Z

openviking/models/vlm/backends/openai_vlm.py

        return

+    # Retry delays (seconds) on 429 RateLimitError before each successive attempt
+    _RATE_LIMIT_RETRY_DELAYS = [10, 30, 60]


[Suggestion] get_vision_completion 和 get_vision_completion_async 同样会遇到 429 rate limit，但未加入重试逻辑。建议提取一个通用的 retry helper（类似 embedder 中的 _call_with_retry），统一应用到所有 completion 方法，避免 vision 场景在大量图片索引时仍因 429 而失败。

Dilan-TH added 2 commits March 12, 2026 21:57

fix: add 429 retry backoff and input truncation to OpenAI embedder

8d607f4

fix: add 429 retry backoff to OpenAI VLM sync and async completion

96ac12c

github-project-automation bot added this to OpenViking project Mar 13, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 13, 2026

Dilan-TH added 3 commits March 12, 2026 21:59

fix: tighten MAX_CHARS to 7000 to handle dense YAML (~1 char/token)

bd68d96

fix: add warning logs to embedding retry on 429

cdf3e1d

fix: add warning logs to VLM retry on 429

1e9d7fa

qin-ctx reviewed Mar 13, 2026

View reviewed changes

Dilan-TH closed this Mar 13, 2026

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 13, 2026

Dilan-TH deleted the fix/openai-rate-limit-retry branch March 13, 2026 04:17

mvanhorn mentioned this pull request Mar 23, 2026

feat(semantic): add exponential backoff retry for LLM rate limiting #889

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add 429 rate limit retry backoff to OpenAI embedder and VLM#568

fix: add 429 rate limit retry backoff to OpenAI embedder and VLM#568
Dilan-TH wants to merge 5 commits intovolcengine:mainfrom
Dilan-TH:fix/openai-rate-limit-retry

Dilan-TH commented Mar 13, 2026

Uh oh!

CLAassistant commented Mar 13, 2026

Uh oh!

qin-ctx Mar 13, 2026

Uh oh!

qin-ctx Mar 13, 2026

Uh oh!

qin-ctx Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Dilan-TH commented Mar 13, 2026

Problem

Changes

openviking/models/embedder/openai_embedders.py

openviking/models/vlm/backends/openai_vlm.py

Why this approach

Uh oh!

CLAassistant commented Mar 13, 2026

Uh oh!

qin-ctx Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`openviking/models/embedder/openai_embedders.py`

`openviking/models/vlm/backends/openai_vlm.py`