Skip to content

Conversation

danielafrimi
Copy link
Contributor

@danielafrimi danielafrimi commented Sep 10, 2025

This PR implements support for the C-RADIO (Retrieval-Augmented Dual Instruction Optimization) vision encoder in vLLM, enabling its use with multimodal models like Nano Nemotron VL.

Changes

New Radio Model Implementation (vllm/model_executor/models/radio.py)

  • RadioInternVisionModel: Core vision model using InternVision encoder architecture

Integration Updates (vllm/model_executor/models/nano_nemotron_vl.py)

  • Updated Nano Nemotron VL to use the new RadioModel for vision processing

Testing (tests/models/multimodal/pooling/test_radio.py)

  • Comprehensive tests for RADIO model with nvidia/C-RADIOv2-H
  • Validates output consistency between HuggingFace and vLLM implementations

Technical Notes

Hardcoded Values: The implementation preserves hardcoded values from the original timm package implementation, including OpenAI CLIP normalization constants and predefined ViT model dimensions, ensuring compatibility and reproducibility.

Configuration: Create new configuration approach to instantiate the Radio model based on InterVision model architecture, with dynamic parameter mapping for different ViT variants.

Weight Loading: Custom weight loader handles mapping between HuggingFace and vLLM parameter names, supporting models with radio_model. prefix while skipping unused parameters.

@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Sep 10, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the RADIO vision encoder, enabling its use in multimodal models like Nano Nemotron VL. The changes include a new RadioModel implementation, integration into NanoNemotronVL, and corresponding tests. While the implementation is comprehensive, there are a few critical issues that need to be addressed. A potential crash due to unsafe dictionary access in the configuration helper needs to be fixed. The vLLM implementation of RadioInternVisionModel is missing a final normalization layer present in the original model, which will lead to incorrect outputs. Additionally, a bug in the test file could lead to incorrect or inefficient test execution. There are also opportunities to make the weight loading logic more robust by handling unexpected weights.

Comment on lines +454 to +496
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The RadioInternVisionModel implementation is missing the final normalization layer that is present in the original HuggingFace RadioInternVisionModel. The original model applies a norm layer after the encoder. This omission will lead to incorrect model outputs.

Additionally, the load_weights method in RadioModel silently ignores weights that it doesn't recognize, including the weights for this missing normalization layer (model.norm.weight and model.norm.bias). This makes the issue harder to detect.

You should add the final normalization layer to RadioInternVisionModel and update RadioModel.load_weights to handle its weights.

@danielafrimi
Copy link
Contributor Author

@DarkLight1337 Fixed the comments.

Copy link

mergify bot commented Sep 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @danielafrimi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 16, 2025
Signed-off-by: root <root@cw-dfw-h100-001-305-026.cm.cluster>

Signed-off-by:  <>

cr

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>

cr

Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 16, 2025 14:37
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025
@vllm-bot vllm-bot merged commit 252ada5 into vllm-project:main Sep 17, 2025
46 of 48 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster>
Signed-off-by: charlifu <charlifu@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants