[Model] Move `vision_feature_select_strategy` into `resolve_visual_encoder_outputs` #25938

DarkLight1337 · 2025-09-30T08:37:55Z

Purpose

Clean up some duplicate code across models that use common vision encoders. This also avoids applying layernorm on the features that are not selected.

Also, clean up the signature of resolve_visual_encoder_outputs.

Test Plan

Unblock all multimodal tests

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…coder_outputs` Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

gemini-code-assist

Code Review

This pull request is a nice refactoring that centralizes the vision_feature_select_strategy logic into the resolve_visual_encoder_outputs utility function. This successfully removes duplicated code across several vision models, improving maintainability. The changes are consistent and well-applied across all relevant files.

However, I've found a critical issue in the resolve_visual_encoder_outputs function in vllm/model_executor/models/vision.py. The logic for applying post_layer_norm will cause a runtime error, and the condition to check if the last layer is being used is also incorrect. I've provided a detailed comment with a suggested fix for this.

gemini-code-assist · 2025-09-30T08:40:06Z

vllm/model_executor/models/vision.py

+    uses_last_layer = select_layers[-1] in (len(hs_pool) - 1, -1)
    if post_layer_norm is not None and uses_last_layer:
        hs_pool[-1] = post_layer_norm(encoder_outputs)


There appear to be two issues in this block of code:

The condition to check if the last layer is being used seems incorrect. uses_last_layer is checked against len(hs_pool) - 1, which is len(select_layers) - 1. This doesn't seem to correctly identify if the last layer of the encoder is being used. It should probably check against max_possible_layers, for example: select_layers[-1] in (max_possible_layers - 1, -1).

post_layer_norm is being called with encoder_outputs, which is a list of tensors when select_layers is provided. This will cause a runtime error. It should be called with the last hidden state tensor, which is encoder_outputs[-1].

Here is a suggested fix for both issues:

Suggested change

uses_last_layer = select_layers[-1] in (len(hs_pool) - 1, -1)

if post_layer_norm is not None and uses_last_layer:

hs_pool[-1] = post_layer_norm(encoder_outputs)

uses_last_layer = select_layers[-1] in (max_possible_layers - 1, -1)

if post_layer_norm is not None and uses_last_layer:

hs_pool[-1] = post_layer_norm(encoder_outputs[-1])

These suggestions look reasonable, so I have applied them. cc @alex-jw-brooks

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py

LGTM

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…coder_outputs` (vllm-project#25938) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…coder_outputs` (#25938) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>

[Model] Move vision_feature_select_strategy into `resolve_visual_en…

901e13c

…coder_outputs` Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested review from hmellor and Isotr0py September 30, 2025 08:37

DarkLight1337 requested review from patrickvonplaten and ywang96 as code owners September 30, 2025 08:37

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025

mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 30, 2025

gemini-code-assist bot reviewed Sep 30, 2025

View reviewed changes

DarkLight1337 added 2 commits September 30, 2025 08:43

Apply suggestions

33f6e95

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Avoid unnecessary iteration

d220ff6

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py approved these changes Sep 30, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 30, 2025 08:55

Fix

88d9dd3

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 force-pushed the mv-vision-feature-select-strategy branch from 46f187a to 88d9dd3 Compare September 30, 2025 09:37

DarkLight1337 merged commit d7e34b4 into vllm-project:main Sep 30, 2025
52 checks passed

DarkLight1337 deleted the mv-vision-feature-select-strategy branch September 30, 2025 11:24

lkm2835 added a commit to lkm2835/vllm that referenced this pull request Sep 30, 2025

Merge branch '250930 vllm-project#25938' into vllm-exaone4_5-dev

7ff9541

DarkLight1337 mentioned this pull request Oct 1, 2025

[Misc] Factor out common _apply_feature_select_strategy #26003

Merged

5 tasks

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

[Model] Move vision_feature_select_strategy into `resolve_visual_en…

226073e

…coder_outputs` (vllm-project#25938) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Model] Move vision_feature_select_strategy into `resolve_visual_en…

a189846

…coder_outputs` (#25938) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Move `vision_feature_select_strategy` into `resolve_visual_encoder_outputs` #25938

[Model] Move `vision_feature_select_strategy` into `resolve_visual_encoder_outputs` #25938

DarkLight1337 commented Sep 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 30, 2025

Uh oh!

DarkLight1337 Sep 30, 2025 •

edited

Loading

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs #25938

[Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs #25938

Conversation

DarkLight1337 commented Sep 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[Model] Move `vision_feature_select_strategy` into `resolve_visual_encoder_outputs` #25938

[Model] Move `vision_feature_select_strategy` into `resolve_visual_encoder_outputs` #25938

DarkLight1337 commented Sep 30, 2025 •

edited by github-actions bot

Loading

DarkLight1337 Sep 30, 2025 •

edited

Loading