[Model] Allow passing custom number of max tiles to Nano 2 VL #26403

BloodAxe · 2025-10-08T08:40:07Z

Purpose

This MR allows user to override default number of tiles for Nano 2 VL.

Examples:

# Global override for all requests
llm = LLM
(
  model_path,
  ...
  mm_processor_kwargs=dict(max_num_tiles=3),
)

# Per-request
llm_inputs = {
    "prompt": prompt,
    "mm_processor_kwargs": dict(max_num_tiles=2),
    "multi_modal_data": {
        "image": image,
    },
}

Test Plan

llm = LLM(
    model_path,
    trust_remote_code=True,
    mm_processor_kwargs=dict(max_num_tiles=3),
)

image =cv2.resize(cv2.imread("yach_image.jpeg"), dsize=None, fx=3, fy=3)
image = Image.fromarray(image)

llm_inputs = {
    "prompt": prompt,
    "mm_processor_kwargs": dict(max_num_tiles=2),
    "multi_modal_data": {
        "image": image,
    },
}

outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)

Test Result

Omiting irrelevant lines. This output shows the dummy image is processed with 3 tiles while the user request is processed with 2 tiles as expected

INFO 10-08 11:20:48 [__init__.py:224] Automatically detected platform cuda.
INFO 10-08 11:20:49 [utils.py:239] non-default args: {'trust_remote_code': True, 'gpu_memory_utilization': 0.75, 'disable_log_stats': True, 'enforce_eager': True, 'mm_processor_kwargs': {'max_num_tiles': 3}, 'model': '/home/ekhvedchenia/vlm-hf-code/nano_vl_v2'}
INFO 10-08 11:20:49 [model.py:597] Resolved architecture: NemotronH_Nano_VL_V2
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 10-08 11:20:49 [model.py:1658] Using max model len 131072
INFO 10-08 11:20:49 [scheduler.py:225] Chunked prefill is enabled with max_num_batched_tokens=8192.

L649: max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
hf_processor_mm_kwargs.get("max_num_tiles") 3
L649: max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
L649: max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
hf_processor_mm_kwargs.get("max_num_tiles") 3

INFO 10-08 11:21:00 [llm.py:341] Supported_tasks: ('generate',)
L516: Passed max_num_tiles=3 self.max_num_tiles=3

Adding requests:   0%|                                    | 0/1 [00:00<?, ?it/s]WARNING 10-08 
L516: Passed max_num_tiles=2 self.max_num_tiles=2
hf_processor_mm_kwargs.get("max_num_tiles") 2
Adding requests: 100%|████████████████████████████| 1/1 [00:00<00:00,  2.40it/s]
Processed prompts: 100%|█| 1/1 [00:04<00:00,  4.92s/it, est. speed input: 160.94
A yacht is sailing on the water.

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

chatgpt-codex-connector

💡 Codex Review

vllm/vllm/model_executor/models/nano_nemotron_vl.py

Lines 920 to 937 in 3db2899

    
           class NanoNemotronVLDummyInputsBuilder(BaseDummyInputsBuilder[_I]): 
        
               """Basic image-only DummyInputsBuilder for InternVL-style models.""" 
        
               def get_dummy_text(self, mm_counts: Mapping[str, int]) -> str: 
        
                   num_images = mm_counts.get("image", 0) 
        
                   return "<image>" * num_images 
        
               def get_dummy_mm_data( 
        
                   self, 
        
                   seq_len: int, 
        
                   mm_counts: Mapping[str, int], 
        
                   mm_options: Optional[Mapping[str, BaseDummyOptions]] = None, 
        
               ) -> MultiModalDataDict: 
        
                   # Use default max_num_tiles for dummy data generation 
        
                   max_num_tiles = 12 
        
                   target_width, target_height = self.info.get_image_size_with_most_features( 
        
                       max_num_tiles

Dummy inputs ignore configured max tiles

The new max_num_tiles option is stored on the processor, but the dummy-input builder still hardcodes max_num_tiles = 12 when generating the largest test image. If a model is configured with a higher default (e.g. mm_processor_kwargs={"max_num_tiles": 24}), the dummy data used for profiling and determining token capacities will be built for only 12 tiles and the engine will under-estimate the number of image tokens. That can lead to insufficient memory/token allocation during startup and subsequent runtime failures once requests use the larger tile count. This should pull the value from the processor (or from the kwargs) instead of the literal 12.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

BloodAxe added 3 commits October 8, 2025 11:32

Allow passing "mm_processor_kwargs": dict(max_num_tiles=2),

a3bd4e2

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

Ensure video modality always uses 1 tile (performance optimization)

1656894

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

Merge branch 'main' into bugfix/nano-allow-max-num-tiles-override

3db2899

BloodAxe marked this pull request as ready for review October 8, 2025 08:53

chatgpt-codex-connector bot reviewed Oct 8, 2025

View reviewed changes

DarkLight1337 approved these changes Oct 8, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 8, 2025 09:27

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 8, 2025

DarkLight1337 merged commit f9582fd into vllm-project:main Oct 8, 2025
54 checks passed

mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request Oct 9, 2025

[Model] Allow passing custom number of max tiles to Nano 2 VL (vllm-p…

d4a8ed5

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Model] Allow passing custom number of max tiles to Nano 2 VL (vllm-p…

c076205

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

BloodAxe deleted the bugfix/nano-allow-max-num-tiles-override branch October 22, 2025 13:35

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Model] Allow passing custom number of max tiles to Nano 2 VL (vllm-p…

ba75a2c

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Model] Allow passing custom number of max tiles to Nano 2 VL (vllm-p…

8d88090

…roject#26403) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Allow passing custom number of max tiles to Nano 2 VL #26403

[Model] Allow passing custom number of max tiles to Nano 2 VL #26403

Uh oh!

BloodAxe commented Oct 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	class NanoNemotronVLDummyInputsBuilder(BaseDummyInputsBuilder[_I]):
	"""Basic image-only DummyInputsBuilder for InternVL-style models."""

	def get_dummy_text(self, mm_counts: Mapping[str, int]) -> str:
	num_images = mm_counts.get("image", 0)

	return "<image>" * num_images

	def get_dummy_mm_data(
	self,
	seq_len: int,
	mm_counts: Mapping[str, int],
	mm_options: Optional[Mapping[str, BaseDummyOptions]] = None,
	) -> MultiModalDataDict:
	# Use default max_num_tiles for dummy data generation
	max_num_tiles = 12
	target_width, target_height = self.info.get_image_size_with_most_features(
	max_num_tiles

Uh oh!

[Model] Allow passing custom number of max tiles to Nano 2 VL #26403

[Model] Allow passing custom number of max tiles to Nano 2 VL #26403

Uh oh!

Conversation

BloodAxe commented Oct 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BloodAxe commented Oct 8, 2025 •

edited by github-actions bot

Loading