feat(vlm): add streaming response handling for OpenAI VLM by KorenKrita · Pull Request #740 · volcengine/OpenViking

KorenKrita · 2026-03-18T11:30:56Z

Description

Some OpenAI-compatible APIs (e.g., certain third-party proxies or gateways) return SSE streaming responses even when stream=False is requested. This causes the existing OpenAIVLM backend to crash with AttributeError when it tries to directly access response.choices[0].message.content.

This change adds automatic detection and adaptation to OpenAIVLM: when the API returns a streaming response regardless of the requested format, the backend now transparently consumes the stream, concatenates the content, and correctly extracts token usage. Normal non-streaming responses are unaffected — no caller changes required.

Related Issue

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update

Changes Made

Add _is_streaming_response() and _is_async_streaming_response() to detect streaming vs non-streaming responses by checking __iter__/__aiter__ and choices attributes, with exclusion of basic types (str/list/dict)
Add _extract_content_from_chunk() and _extract_usage_from_chunk() helpers to extract content and token usage from individual SSE chunks
Add _process_streaming_chunks() sync method that consumes all chunks, concatenates content, and records the last non-zero usage
Add _extract_content_and_usage() / _extract_content_and_usage_async() unified entry points that auto-select streaming or non-streaming processing based on response type
Add _handle_response() / _handle_response_async() and _finalize_response() for common post-processing (empty response warnings, token usage update)
Refactor get_completion, get_completion_async, get_vision_completion, get_vision_completion_async to use the new response handling pipeline instead of direct attribute access
Add tests/unit/test_openai_vlm_streaming.py with 5 test classes and 14 test cases covering:
- Streaming/non-streaming detection (including __iter__/__aiter__/_iterator/basic type exclusion)
- Chunk content and usage extraction
- Sync and async streaming consumption with content concatenation
- Full Mock OpenAI client integration tests (text completion / vision completion / token usage tracking)

Testing

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested this on the following platforms:
- Linux
- macOS
- Windows

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

Add support for handling SSE streaming responses from APIs that force streaming format even when stream=False is requested. Changes: - Add _extract_content_and_usage() for sync responses - Add _extract_content_and_usage_async() for async responses - Add _handle_response() and _handle_response_async() for response handling - Add _finalize_response() to eliminate duplicate post-processing logic - Add _process_streaming_chunks() to reduce code duplication - Add _extract_content_from_chunk() and _extract_usage_from_chunk() helpers - Add response type detection with basic type filtering (str/list/dict/bytes) - Update all completion methods to use new handlers - Remove redundant _update_token_usage_from_response calls - Add warning log for empty responses - Add comprehensive unit tests Refinements: - Add choices attribute check to _iterator detection (avoid false positives) - Add docstring warning that streaming response is consumed - Add comment explaining async version doesn't reuse _process_streaming_chunks - Fix test assertions to use correct get_token_usage_summary() method Co-Authored-By: KorenKrita <KorenKrita@gmail.com>

…lcengine#740)" This reverts commit 247293b.

…)" (#745) This reverts commit 247293b.

- Add .pr_agent.toml with 15 repo-specific review rules derived from real bug history (PRs volcengine#505, volcengine#728, volcengine#749, volcengine#740/volcengine#745, volcengine#754, volcengine#735, volcengine#767) - Rules structured as WHEN/THEN/BECAUSE for deterministic enforcement - Add 8 custom labels (memory-pipeline, async-change, api-breaking, etc.) - Add ignore patterns for lock files, third_party, build artifacts - Enable score review, TODO scan, split-PR detection, security audit - Configure improve tool with quality threshold and extended mode - Configure describe tool with PR diagrams and semantic file types - Update workflow: ark-code-latest model, checkout step for .pr_agent.toml, move all config from inline YAML to .pr_agent.toml (single source of truth)

…#780) - Add .pr_agent.toml with 15 repo-specific review rules derived from real bug history (PRs #505, #728, #749, #740/#745, #754, #735, #767) - Rules structured as WHEN/THEN/BECAUSE for deterministic enforcement - Add 8 custom labels (memory-pipeline, async-change, api-breaking, etc.) - Add ignore patterns for lock files, third_party, build artifacts - Enable score review, TODO scan, split-PR detection, security audit - Configure improve tool with quality threshold and extended mode - Configure describe tool with PR diagrams and semantic file types - Update workflow: ark-code-latest model, checkout step for .pr_agent.toml, move all config from inline YAML to .pr_agent.toml (single source of truth)

github-project-automation bot added this to OpenViking project Mar 18, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 18, 2026

MaojiaSheng approved these changes Mar 18, 2026

View reviewed changes

MaojiaSheng merged commit 247293b into volcengine:main Mar 18, 2026
6 checks passed

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 18, 2026

KorenKrita added a commit to KorenKrita/OpenViking that referenced this pull request Mar 18, 2026

Revert "feat(vlm): add streaming response handling for OpenAI VLM (vo…

f8e44b8

…lcengine#740)" This reverts commit 247293b.

KorenKrita mentioned this pull request Mar 18, 2026

Revert "feat(vlm): add streaming response handling for OpenAI VLM" #745

Merged

qin-ctx pushed a commit that referenced this pull request Mar 18, 2026

Revert "feat(vlm): add streaming response handling for OpenAI VLM (#740…

2294355

…)" (#745) This reverts commit 247293b.

This was referenced Mar 18, 2026

feat(vlm): add streaming response handling for OpenAI VLM KorenKrita/OpenViking#1

Open

feat(vlm): add streaming response handling for OpenAI VLM #756

Merged

chethanuk mentioned this pull request Mar 19, 2026

feat(ci): add comprehensive Qodo PR-Agent review rules #780

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vlm): add streaming response handling for OpenAI VLM#740

feat(vlm): add streaming response handling for OpenAI VLM#740
MaojiaSheng merged 1 commit intovolcengine:mainfrom
KorenKrita:feat/openai-vlm-streaming-handler

KorenKrita commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KorenKrita commented Mar 18, 2026

Description

Related Issue

Type of Change

Changes Made

Testing

Checklist

Screenshots (if applicable)

Additional Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants