feat(vlm): add streaming response handling for OpenAI VLM#756
feat(vlm): add streaming response handling for OpenAI VLM#756qin-ctx merged 2 commits intovolcengine:mainfrom
Conversation
Add configurable stream mode for OpenAI-compatible VLM providers. This enables proper handling of SSE (Server-Sent Events) responses from providers that force streaming format. Changes: - VLMBase extracts stream from config (default: False) - OpenAIVLM adds _extract_from_chunk(), _process_streaming_response(), _process_streaming_response_async() for streaming response processing - All API methods (get_completion, get_completion_async, get_vision_completion, get_vision_completion_async) now support stream parameter - VLMConfig adds stream field with migration to providers structure - Update configuration docs (zh/en) with stream parameter documentation - Add comprehensive tests for stream configuration Co-Authored-By: KorenKrita <KorenKrita@gmail.com>
qin-ctx
left a comment
There was a problem hiding this comment.
Overall the design is clean — explicit opt-in is the right approach, and the config precedence logic is correct. One blocking issue in the async test mocking setup.
tests/unit/test_stream_config_vlm.py
Outdated
| for chunk in chunks: | ||
| yield chunk | ||
|
|
||
| mock_client.chat.completions.create.return_value = async_generator() |
There was a problem hiding this comment.
[Bug] (blocking) MagicMock does not support await — this mock setup likely causes TypeError at runtime.
mock_client.chat.completions.create is a MagicMock. Calling it returns return_value (the async generator object). When the code under test does response = await client.chat.completions.create(**kwargs), Python tries to await the async generator, which raises TypeError — async generators implement __aiter__/__anext__ (async iterable protocol), not __await__ (awaitable protocol). Only AsyncMock defines __await__.
Same issue exists in test_vision_completion_async_stream_true below (line ~220).
Compare with test_async_stream_false which correctly replaces create with a real async function:
async def mock_create(*args, **kwargs):
return mock_response
mock_client.chat.completions.create = mock_createSuggested fix — use the same pattern:
async def mock_create(*args, **kwargs):
return async_generator()
mock_client.chat.completions.create = mock_create
Description
Add configurable
streammode for OpenAI-compatible VLM providers. Some proxy/gateway providers (e.g., catclaw-proxy, OpenRouter) force SSE (Server-Sent Events) streaming format, causing the OpenAI SDK to fail parsing responses whenstream=false. This PR introduces an explicitstreamconfiguration option so users can enable streaming mode to properly handle such providers.This replaces the previous auto-detection approach (PR #740, reverted in #745) with a cleaner, explicit opt-in design.
Related Issue
Type of Change
Changes Made
openviking/models/vlm/base.py): Extractsstreamfrom config dict (default:False)openviking/models/vlm/backends/openai_vlm.py):_extract_from_chunk()to extract content and token usage from a single streaming chunk_process_streaming_response()/_process_streaming_response_async()to accumulate chunks and track token usageself.stream: streaming path uses chunk accumulation, non-streaming path uses the original logicopenviking_cli/utils/config/vlm_config.py):stream: boolfield (default:False)streamtoprovidersstructure in_migrate_legacy_config()_build_vlm_config_dict()resolves stream from provider config first, then falls back to top-leveldocs/en/guides/01-configuration.mdanddocs/zh/guides/01-configuration.mdwithstreamparameter description, usage example, and usage notetests/unit/test_stream_config_vlm.py): Added comprehensive tests covering default value, stream=True/False for sync/async, vision completion, VLMConfig migration and precedence, and streaming response processing logicTesting
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested this on the following platforms:
Checklist
Screenshots (if applicable)
Additional Notes
VolcEngineVLMandLiteLLMVLMProviderbackends do not consumeself.streamfromVLMBase. Users who setstream: truewith these providers will have the config silently ignored.streamconfig takes precedence over top-levelstream, consistent withextra_headersbehavior.