Skip to content

feat(vlm): add streaming response handling for OpenAI VLM#756

Merged
qin-ctx merged 2 commits intovolcengine:mainfrom
KorenKrita:feat/vlm-stream-config
Mar 19, 2026
Merged

feat(vlm): add streaming response handling for OpenAI VLM#756
qin-ctx merged 2 commits intovolcengine:mainfrom
KorenKrita:feat/vlm-stream-config

Conversation

@KorenKrita
Copy link
Contributor

Description

Add configurable stream mode for OpenAI-compatible VLM providers. Some proxy/gateway providers (e.g., catclaw-proxy, OpenRouter) force SSE (Server-Sent Events) streaming format, causing the OpenAI SDK to fail parsing responses when stream=false. This PR introduces an explicit stream configuration option so users can enable streaming mode to properly handle such providers.

This replaces the previous auto-detection approach (PR #740, reverted in #745) with a cleaner, explicit opt-in design.

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • VLMBase (openviking/models/vlm/base.py): Extracts stream from config dict (default: False)
  • OpenAIVLM (openviking/models/vlm/backends/openai_vlm.py):
    • Adds _extract_from_chunk() to extract content and token usage from a single streaming chunk
    • Adds _process_streaming_response() / _process_streaming_response_async() to accumulate chunks and track token usage
    • All four API methods now branch on self.stream: streaming path uses chunk accumulation, non-streaming path uses the original logic
  • VLMConfig (openviking_cli/utils/config/vlm_config.py):
    • Adds stream: bool field (default: False)
    • Migrates top-level stream to providers structure in _migrate_legacy_config()
    • _build_vlm_config_dict() resolves stream from provider config first, then falls back to top-level
  • Documentation: Updated docs/en/guides/01-configuration.md and docs/zh/guides/01-configuration.md with stream parameter description, usage example, and usage note
  • Tests (tests/unit/test_stream_config_vlm.py): Added comprehensive tests covering default value, stream=True/False for sync/async, vision completion, VLMConfig migration and precedence, and streaming response processing logic

Testing

  • I have added tests that prove my fix is effective or that my feature works

  • New and existing unit tests pass locally with my changes

  • I have tested this on the following platforms:

    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

  • VolcEngineVLM and LiteLLMVLMProvider backends do not consume self.stream from VLMBase. Users who set stream: true with these providers will have the config silently ignored.
  • Provider-level stream config takes precedence over top-level stream, consistent with extra_headers behavior.

Add configurable stream mode for OpenAI-compatible VLM providers.
This enables proper handling of SSE (Server-Sent Events) responses
from providers that force streaming format.

Changes:
- VLMBase extracts stream from config (default: False)
- OpenAIVLM adds _extract_from_chunk(), _process_streaming_response(),
  _process_streaming_response_async() for streaming response processing
- All API methods (get_completion, get_completion_async, get_vision_completion,
  get_vision_completion_async) now support stream parameter
- VLMConfig adds stream field with migration to providers structure
- Update configuration docs (zh/en) with stream parameter documentation
- Add comprehensive tests for stream configuration

Co-Authored-By: KorenKrita <KorenKrita@gmail.com>
Copy link
Collaborator

@qin-ctx qin-ctx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the design is clean — explicit opt-in is the right approach, and the config precedence logic is correct. One blocking issue in the async test mocking setup.

for chunk in chunks:
yield chunk

mock_client.chat.completions.create.return_value = async_generator()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] (blocking) MagicMock does not support await — this mock setup likely causes TypeError at runtime.

mock_client.chat.completions.create is a MagicMock. Calling it returns return_value (the async generator object). When the code under test does response = await client.chat.completions.create(**kwargs), Python tries to await the async generator, which raises TypeError — async generators implement __aiter__/__anext__ (async iterable protocol), not __await__ (awaitable protocol). Only AsyncMock defines __await__.

Same issue exists in test_vision_completion_async_stream_true below (line ~220).

Compare with test_async_stream_false which correctly replaces create with a real async function:

async def mock_create(*args, **kwargs):
    return mock_response
mock_client.chat.completions.create = mock_create

Suggested fix — use the same pattern:

async def mock_create(*args, **kwargs):
    return async_generator()
mock_client.chat.completions.create = mock_create

@qin-ctx qin-ctx merged commit d739a5b into volcengine:main Mar 19, 2026
6 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants