Skip to content

fix(vlm): omit unset default max_tokens#2949

Merged
chenjw merged 2 commits into
mainfrom
fix/vlm-omit-default-max-tokens
Jul 2, 2026
Merged

fix(vlm): omit unset default max_tokens#2949
chenjw merged 2 commits into
mainfrom
fix/vlm-omit-default-max-tokens

Conversation

@qin-ctx

@qin-ctx qin-ctx commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Description

Remove the remaining hardcoded VLM max_tokens=32768 fallbacks outside the OpenAI backend. When vlm.max_tokens is not configured, these backends now omit the token-limit parameter and let the selected model/provider apply its own default.

Related Issue

Fixes #2751

Follow-up to #2946, which already removed the OpenAI backend fallback.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Stop sending max_tokens when vlm.max_tokens is unset in LiteLLM and VolcEngine VLM backends.
  • Remove the Kimi-specific default max_tokens constant and fallback.
  • Remove the stale Kimi unit-test assertion for the old default.

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Not run locally; this is a small request-parameter change and the earlier uv run validation was interrupted before completion.

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

N/A

Additional Notes

Leaving the parameter unset avoids hardcoding one completion-token budget for models and providers with different limits.

@qin-ctx qin-ctx force-pushed the fix/vlm-omit-default-max-tokens branch from 2872791 to 2b1dacb Compare July 2, 2026 08:05
@chenjw chenjw merged commit 2bf9e49 into main Jul 2, 2026
3 checks passed
@github-project-automation github-project-automation Bot moved this from Backlog to Done in OpenViking project Jul 2, 2026
@chenjw chenjw deleted the fix/vlm-omit-default-max-tokens branch July 2, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug] VLM default max_tokens=32768 exceeds completion-token limit of common models (e.g. gpt-4o-mini → 400), silently yielding 0 extracted memories

2 participants