-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Model] Allow passing custom number of max tiles to Nano 2 VL #26403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Allow passing custom number of max tiles to Nano 2 VL #26403
Conversation
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
vllm/vllm/model_executor/models/nano_nemotron_vl.py
Lines 920 to 937 in 3db2899
| class NanoNemotronVLDummyInputsBuilder(BaseDummyInputsBuilder[_I]): | |
| """Basic image-only DummyInputsBuilder for InternVL-style models.""" | |
| def get_dummy_text(self, mm_counts: Mapping[str, int]) -> str: | |
| num_images = mm_counts.get("image", 0) | |
| return "<image>" * num_images | |
| def get_dummy_mm_data( | |
| self, | |
| seq_len: int, | |
| mm_counts: Mapping[str, int], | |
| mm_options: Optional[Mapping[str, BaseDummyOptions]] = None, | |
| ) -> MultiModalDataDict: | |
| # Use default max_num_tiles for dummy data generation | |
| max_num_tiles = 12 | |
| target_width, target_height = self.info.get_image_size_with_most_features( | |
| max_num_tiles |
The new max_num_tiles option is stored on the processor, but the dummy-input builder still hardcodes max_num_tiles = 12 when generating the largest test image. If a model is configured with a higher default (e.g. mm_processor_kwargs={"max_num_tiles": 24}), the dummy data used for profiling and determining token capacities will be built for only 12 tiles and the engine will under-estimate the number of image tokens. That can lead to insufficient memory/token allocation during startup and subsequent runtime failures once requests use the larger tile count. This should pull the value from the processor (or from the kwargs) instead of the literal 12.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Purpose
This MR allows user to override default number of tiles for Nano 2 VL.
Examples:
Test Plan
Test Result
Omiting irrelevant lines. This output shows the dummy image is processed with 3 tiles while the user request is processed with 2 tiles as expected