fix(embedder): auto-retry with smaller chunks when input exceeds model limit by deepakdevp · Pull Request #736 · volcengine/OpenViking

deepakdevp · 2026-03-18T08:14:38Z

Summary

When the embedding API rejects input as too large, automatically retry with chunking at half the current max_tokens instead of crashing
Logs a warning guiding users to set embedding.dense.max_tokens in ov.conf

Fixes #731

Root Cause

The default max_tokens is 8000 (designed for OpenAI models). Users with models that have smaller limits (e.g., bce-embedding-base_v1 with 512 max tokens) hit errors because:

Token estimation uses len(text) // 3 for non-OpenAI models (no tiktoken), which underestimates CJK text
_chunk_and_embed() creates chunks of max_tokens size, but each chunk still exceeds the model's actual limit
The API returns "input too large" and the embedder crashes

Fix

In OpenAIDenseEmbedder.embed(), catch "too large/too long/maximum context length" errors from _embed_single() and retry with _chunk_and_embed() at half the max_tokens. This handles both:

Text that passes the estimation check but fails at the API
Models with smaller limits than the 8000 default

Changes Made

openviking/models/embedder/openai_embedders.py: Added error-driven retry in embed() with reduced chunk size and user-facing warning

Type of Change

Bug fix (non-breaking change which fixes an issue)

Testing

Unit tests pass locally (11/11 embedder tests)
Tested on macOS

…l limit When the embedding API rejects input as "too large" (common with non-OpenAI models where token estimation is inaccurate), retry with chunking at half the current max_tokens instead of crashing. Also logs a warning guiding users to set embedding.dense.max_tokens in ov.conf to match their model's actual limit. Fixes volcengine#731. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

qin-ptr

Review Summary

This PR fixes a crash when embedding input exceeds model limits by adding automatic retry with smaller chunks. However, there is a critical concurrency bug that must be fixed before merging.

See inline comments for details.

Blocking issue: The implementation modifies instance attribute _max_tokens which is not thread-safe when embed() is called concurrently via asyncio.to_thread() (as seen in collection_schemas.py:251).

qin-ptr · 2026-03-18T08:30:27Z

openviking/models/embedder/openai_embedders.py

+                reduced = max(self.max_tokens // 2, 128)
+                logger.warning(
+                    f"Embedding failed due to input length. "
+                    f"Retrying with chunk size {reduced} tokens. "


[Bug] (blocking)

Concurrency Safety Issue: Directly modifying the instance attribute self._max_tokens is not thread-safe.

From collection_schemas.py:251-252, embed() is called via asyncio.to_thread() which allows concurrent execution in a thread pool. If multiple threads call embed() simultaneously:

Thread A sets self._max_tokens = reduced

Thread B reads self.max_tokens (in _chunk_and_embed() or _chunk_text()) and gets the wrong value

Thread A restores self._max_tokens in finally, potentially overwriting Thread B's modification

Suggested Fix: Pass max_tokens as a parameter instead of modifying the instance attribute.

Option 1: Add optional parameter to _chunk_and_embed():

def _chunk_and_embed(self, text: str, is_query: bool = False, override_max_tokens: Optional[int] = None) -> EmbedResult: max_tok = override_max_tokens if override_max_tokens is not None else self.max_tokens # Use max_tok instead of self.max_tokens # In retry logic: reduced = max(self.max_tokens // 2, 128) return self._chunk_and_embed(text, is_query=is_query, override_max_tokens=reduced)

This approach is thread-safe because each thread uses its own local variable instead of modifying shared state.

Fixed. Added override_max_tokens parameter to _chunk_and_embed() and _chunk_text() in base.py. The retry now passes override_max_tokens=reduced instead of mutating self._max_tokens. Thread-safe.

qin-ptr · 2026-03-18T08:30:27Z

openviking/models/embedder/openai_embedders.py

+        except RuntimeError as e:
+            error_msg = str(e).lower()
+            if (
+                "too large" in error_msg


[Suggestion] (non-blocking)

Error Message Matching May Be Too Broad: The current check for "too large", "too long", or "maximum context length" might match unrelated errors:

"Request body too large" (not a token length issue)

"File too large" (not an embedding input issue)

Consider more precise matching:

if ( ("input" in error_msg and "too large" in error_msg) or ("token" in error_msg and ("too long" in error_msg or "too many" in error_msg)) or "context length" in error_msg ):

Or check for specific API error codes if available.

Fixed. Tightened error matching to require 'input' + 'too large', 'token' + 'too long/many', or 'context length'. Won't match unrelated 'Request body too large' errors.

…ching - Add override_max_tokens parameter to _chunk_and_embed() and _chunk_text() instead of mutating self._max_tokens (thread-safe) - Tighten error message matching to require specific patterns ("input" + "too large", "token" + "too long/many", "context length") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-project-automation bot added this to OpenViking project Mar 18, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 18, 2026

qin-ptr suggested changes Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(embedder): auto-retry with smaller chunks when input exceeds model limit#736

fix(embedder): auto-retry with smaller chunks when input exceeds model limit#736
deepakdevp wants to merge 2 commits intovolcengine:mainfrom
deepakdevp:fix/embedder-respect-model-max-tokens

deepakdevp commented Mar 18, 2026

Uh oh!

qin-ptr left a comment

Uh oh!

qin-ptr Mar 18, 2026

Uh oh!

deepakdevp Mar 18, 2026

Uh oh!

qin-ptr Mar 18, 2026

Uh oh!

deepakdevp Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

deepakdevp commented Mar 18, 2026

Summary

Root Cause

Fix

Changes Made

Type of Change

Testing

Uh oh!

qin-ptr left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

qin-ptr Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

deepakdevp Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ptr Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

deepakdevp Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants