[Core] Get num_encoder_tokens from scheduler config #24989

russellb · 2025-09-16T19:00:19Z

This is the same change that was made in #24866. In that PR, it was
pointed out that this code:

MULTIMODAL_REGISTRY.get_encdec_max_encoder_len(...)

is much slower than getting the same value that's cached on the
scheduler config:

scheduler_config.max_num_encoder_input_tokens

This PR makes the change to more spots: the scheduler, kv cache
manager, and gpu model runner.

Related to issue #24946.

Signed-off-by: Russell Bryant rbryant@redhat.com

This is the same change that was made in vllm-project#24866. In that PR, it was pointed out that this code: MULTIMODAL_REGISTRY.get_encdec_max_encoder_len(...) is much slower than getting the same value that's cached on the scheduler config: scheduler_config.max_num_encoder_input_tokens This PR makes the change to more spots: the scheduler, kv cache manager, and gpu model runner. Related to issue vllm-project#24946. Signed-off-by: Russell Bryant <rbryant@redhat.com>

gemini-code-assist

Code Review

This pull request aims to optimize performance by replacing a slow function call, get_encdec_max_encoder_len, with a cached value from the scheduler configuration. However, the implementation introduces a critical issue. The cached value used, scheduler_config.max_num_encoder_input_tokens, is an alias for max_num_batched_tokens, which is a scheduling parameter and not the model-specific maximum encoder length. Using this incorrect value for calculating memory usage and allocating the cross-attention KV cache can lead to memory under-allocation and potential out-of-bounds memory access. I have provided comments in each of the affected files recommending that these changes be reverted until a correct caching mechanism for the maximum encoder length is implemented.

gemini-code-assist · 2025-09-16T19:02:23Z

vllm/v1/core/sched/scheduler.py

+                    num_encoder_tokens =\
+                        self.scheduler_config.max_num_encoder_input_tokens


This change replaces the call to get_encdec_max_encoder_len with self.scheduler_config.max_num_encoder_input_tokens. However, max_num_encoder_input_tokens is initialized to max_num_batched_tokens in SchedulerConfig, which is not the correct value for the model's maximum encoder length (e.g., a fixed value like 3000 for Whisper). Using a potentially smaller value from max_num_batched_tokens will lead to under-allocation of the cross-attention KV cache, which can cause out-of-bounds memory access. This is a critical issue. The original call was functionally correct, though slow. I recommend reverting this change until the value is cached correctly in the configuration.

Suggested change

num_encoder_tokens =\

self.scheduler_config.max_num_encoder_input_tokens

num_encoder_tokens = MULTIMODAL_REGISTRY.\

get_encdec_max_encoder_len(

self.vllm_config.model_config)

Here is where it gets set to the correct value that we want:

vllm/vllm/config/__init__.py

Lines 2788 to 2790 in 218454b

elif self.model_config.is_encoder_decoder:

self.scheduler_config.max_num_encoder_input_tokens = \

MULTIMODAL_REGISTRY.get_encdec_max_encoder_len(self.model_config)

vllm/v1/kv_cache_interface.py

vllm/v1/worker/gpu_model_runner.py

russellb · 2025-09-16T19:09:01Z

This isn't a huge improvement (about 3% throughput increase), but it's a good step. There's still something more significant limiting throughput that I'm tracking down.

Signed-off-by: Russell Bryant <rbryant@redhat.com>

Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: charlifu <charlifu@amd.com>

russellb requested review from heheda12345, WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners September 16, 2025 19:00

russellb mentioned this pull request Sep 16, 2025

Directly get max encoder len from VLLM config in V1 #24866

Merged

5 tasks

mergify bot added the v1 label Sep 16, 2025

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

russellb mentioned this pull request Sep 16, 2025

[Bug]: The inference speed of the whisper model under the v1 engine is much slower than v0 #24946

Open

1 task

DarkLight1337 approved these changes Sep 16, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 16, 2025 19:16

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 16, 2025

vllm-bot merged commit 58d4c70 into vllm-project:main Sep 17, 2025
43 of 48 checks passed

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Core] Get num_encoder_tokens from scheduler config (vllm-project#24989)

8c96102

Signed-off-by: Russell Bryant <rbryant@redhat.com>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[Core] Get num_encoder_tokens from scheduler config (vllm-project#24989)

1dcf768

Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: charlifu <charlifu@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Get num_encoder_tokens from scheduler config #24989

[Core] Get num_encoder_tokens from scheduler config #24989

Uh oh!

russellb commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 16, 2025

Uh oh!

russellb Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

russellb commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

		num_encoder_tokens =\
		self.scheduler_config.max_num_encoder_input_tokens

	elif self.model_config.is_encoder_decoder:
	self.scheduler_config.max_num_encoder_input_tokens = \
	MULTIMODAL_REGISTRY.get_encdec_max_encoder_len(self.model_config)

Uh oh!

[Core] Get num_encoder_tokens from scheduler config #24989

[Core] Get num_encoder_tokens from scheduler config #24989

Uh oh!

Conversation

russellb commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

russellb Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

russellb commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!