feat(vlm): add streaming response handling for OpenAI VLM by KorenKrita · Pull Request #756 · volcengine/OpenViking

KorenKrita · 2026-03-19T02:15:50Z

Description

Add configurable stream mode for OpenAI-compatible VLM providers. Some proxy/gateway providers (e.g., catclaw-proxy, OpenRouter) force SSE (Server-Sent Events) streaming format, causing the OpenAI SDK to fail parsing responses when stream=false. This PR introduces an explicit stream configuration option so users can enable streaming mode to properly handle such providers.

This replaces the previous auto-detection approach (PR #740, reverted in #745) with a cleaner, explicit opt-in design.

Related Issue

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update

Changes Made

VLMBase (openviking/models/vlm/base.py): Extracts stream from config dict (default: False)
OpenAIVLM (openviking/models/vlm/backends/openai_vlm.py):
- Adds _extract_from_chunk() to extract content and token usage from a single streaming chunk
- Adds _process_streaming_response() / _process_streaming_response_async() to accumulate chunks and track token usage
- All four API methods now branch on self.stream: streaming path uses chunk accumulation, non-streaming path uses the original logic
VLMConfig (openviking_cli/utils/config/vlm_config.py):
- Adds stream: bool field (default: False)
- Migrates top-level stream to providers structure in _migrate_legacy_config()
- _build_vlm_config_dict() resolves stream from provider config first, then falls back to top-level
Documentation: Updated docs/en/guides/01-configuration.md and docs/zh/guides/01-configuration.md with stream parameter description, usage example, and usage note
Tests (tests/unit/test_stream_config_vlm.py): Added comprehensive tests covering default value, stream=True/False for sync/async, vision completion, VLMConfig migration and precedence, and streaming response processing logic

Testing

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested this on the following platforms:
- Linux
- macOS
- Windows

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

VolcEngineVLM and LiteLLMVLMProvider backends do not consume self.stream from VLMBase. Users who set stream: true with these providers will have the config silently ignored.
Provider-level stream config takes precedence over top-level stream, consistent with extra_headers behavior.

Add configurable stream mode for OpenAI-compatible VLM providers. This enables proper handling of SSE (Server-Sent Events) responses from providers that force streaming format. Changes: - VLMBase extracts stream from config (default: False) - OpenAIVLM adds _extract_from_chunk(), _process_streaming_response(), _process_streaming_response_async() for streaming response processing - All API methods (get_completion, get_completion_async, get_vision_completion, get_vision_completion_async) now support stream parameter - VLMConfig adds stream field with migration to providers structure - Update configuration docs (zh/en) with stream parameter documentation - Add comprehensive tests for stream configuration Co-Authored-By: KorenKrita <KorenKrita@gmail.com>

qin-ctx

Overall the design is clean — explicit opt-in is the right approach, and the config precedence logic is correct. One blocking issue in the async test mocking setup.

qin-ctx · 2026-03-19T03:23:17Z

tests/unit/test_stream_config_vlm.py

+            for chunk in chunks:
+                yield chunk
+
+        mock_client.chat.completions.create.return_value = async_generator()


[Bug] (blocking) MagicMock does not support await — this mock setup likely causes TypeError at runtime.

mock_client.chat.completions.create is a MagicMock. Calling it returns return_value (the async generator object). When the code under test does response = await client.chat.completions.create(**kwargs), Python tries to await the async generator, which raises TypeError — async generators implement __aiter__/__anext__ (async iterable protocol), not __await__ (awaitable protocol). Only AsyncMock defines __await__.

Same issue exists in test_vision_completion_async_stream_true below (line ~220).

Compare with test_async_stream_false which correctly replaces create with a real async function:

async def mock_create(*args, **kwargs): return mock_response mock_client.chat.completions.create = mock_create

Suggested fix — use the same pattern:

async def mock_create(*args, **kwargs): return async_generator() mock_client.chat.completions.create = mock_create

github-project-automation bot added this to OpenViking project Mar 19, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 19, 2026

qin-ctx self-assigned this Mar 19, 2026

qin-ctx requested changes Mar 19, 2026

View reviewed changes

cr fix: fix MagicMock async generator issue in VLM tests

462c63a

qin-ctx approved these changes Mar 19, 2026

View reviewed changes

qin-ctx merged commit d739a5b into volcengine:main Mar 19, 2026
6 checks passed

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 19, 2026

chethanuk mentioned this pull request Mar 19, 2026

feat(ci): add comprehensive Qodo PR-Agent review rules #780

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vlm): add streaming response handling for OpenAI VLM#756

feat(vlm): add streaming response handling for OpenAI VLM#756
qin-ctx merged 2 commits intovolcengine:mainfrom
KorenKrita:feat/vlm-stream-config

KorenKrita commented Mar 19, 2026

Uh oh!

qin-ctx left a comment

Uh oh!

qin-ctx Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KorenKrita commented Mar 19, 2026

Description

Related Issue

Type of Change

Changes Made

Testing

Checklist

Screenshots (if applicable)

Additional Notes

Uh oh!

qin-ctx left a comment

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants