Skip to content

Conversation

@BloodAxe
Copy link
Contributor

@BloodAxe BloodAxe commented Oct 8, 2025

Purpose

This MR allows user to override default number of tiles for Nano 2 VL.

Examples:

# Global override for all requests
llm = LLM
(
  model_path,
  ...
  mm_processor_kwargs=dict(max_num_tiles=3),
)
# Per-request
llm_inputs = {
    "prompt": prompt,
    "mm_processor_kwargs": dict(max_num_tiles=2),
    "multi_modal_data": {
        "image": image,
    },
}

Test Plan

llm = LLM(
    model_path,
    trust_remote_code=True,
    mm_processor_kwargs=dict(max_num_tiles=3),
)

image =cv2.resize(cv2.imread("yach_image.jpeg"), dsize=None, fx=3, fy=3)
image = Image.fromarray(image)

llm_inputs = {
    "prompt": prompt,
    "mm_processor_kwargs": dict(max_num_tiles=2),
    "multi_modal_data": {
        "image": image,
    },
}

outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)

Test Result

Omiting irrelevant lines. This output shows the dummy image is processed with 3 tiles while the user request is processed with 2 tiles as expected

INFO 10-08 11:20:48 [__init__.py:224] Automatically detected platform cuda.
INFO 10-08 11:20:49 [utils.py:239] non-default args: {'trust_remote_code': True, 'gpu_memory_utilization': 0.75, 'disable_log_stats': True, 'enforce_eager': True, 'mm_processor_kwargs': {'max_num_tiles': 3}, 'model': '/home/ekhvedchenia/vlm-hf-code/nano_vl_v2'}
INFO 10-08 11:20:49 [model.py:597] Resolved architecture: NemotronH_Nano_VL_V2
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 10-08 11:20:49 [model.py:1658] Using max model len 131072
INFO 10-08 11:20:49 [scheduler.py:225] Chunked prefill is enabled with max_num_batched_tokens=8192.

L649: max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
hf_processor_mm_kwargs.get("max_num_tiles") 3
L649: max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
L649: max_num_tiles=3
L516: Passed max_num_tiles=3 self.max_num_tiles=3
hf_processor_mm_kwargs.get("max_num_tiles") 3

INFO 10-08 11:21:00 [llm.py:341] Supported_tasks: ('generate',)
L516: Passed max_num_tiles=3 self.max_num_tiles=3

Adding requests:   0%|                                    | 0/1 [00:00<?, ?it/s]WARNING 10-08 
L516: Passed max_num_tiles=2 self.max_num_tiles=2
hf_processor_mm_kwargs.get("max_num_tiles") 2
Adding requests: 100%|████████████████████████████| 1/1 [00:00<00:00,  2.40it/s]
Processed prompts: 100%|█| 1/1 [00:04<00:00,  4.92s/it, est. speed input: 160.94
A yacht is sailing on the water.

@BloodAxe BloodAxe marked this pull request as ready for review October 8, 2025 08:53
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

class NanoNemotronVLDummyInputsBuilder(BaseDummyInputsBuilder[_I]):
"""Basic image-only DummyInputsBuilder for InternVL-style models."""
def get_dummy_text(self, mm_counts: Mapping[str, int]) -> str:
num_images = mm_counts.get("image", 0)
return "<image>" * num_images
def get_dummy_mm_data(
self,
seq_len: int,
mm_counts: Mapping[str, int],
mm_options: Optional[Mapping[str, BaseDummyOptions]] = None,
) -> MultiModalDataDict:
# Use default max_num_tiles for dummy data generation
max_num_tiles = 12
target_width, target_height = self.info.get_image_size_with_most_features(
max_num_tiles

P1 Badge Dummy inputs ignore configured max tiles

The new max_num_tiles option is stored on the processor, but the dummy-input builder still hardcodes max_num_tiles = 12 when generating the largest test image. If a model is configured with a higher default (e.g. mm_processor_kwargs={"max_num_tiles": 24}), the dummy data used for profiling and determining token capacities will be built for only 12 tiles and the engine will under-estimate the number of image tokens. That can lead to insufficient memory/token allocation during startup and subsequent runtime failures once requests use the larger tile count. This should pull the value from the processor (or from the kwargs) instead of the literal 12.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 8, 2025 09:27
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 8, 2025
@DarkLight1337 DarkLight1337 merged commit f9582fd into vllm-project:main Oct 8, 2025
54 checks passed
mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request Oct 9, 2025
…roject#26403)

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…roject#26403)

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…roject#26403)

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…roject#26403)

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
@BloodAxe BloodAxe deleted the bugfix/nano-allow-max-num-tiles-override branch October 22, 2025 13:35
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…roject#26403)

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…roject#26403)

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…roject#26403)

Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants