add dtype and max_num_batched_tokens to classify example so that it can be run out of box #24965

chenfengjin · 2025-09-16T13:03:38Z

Purpose

the example basic/classify.py failed to generate probabilities

Processed prompts: 100%|█████████████████████| 4/4 [00:00<00:00, 11.52it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

Generated Outputs:
------------------------------------------------------------
Prompt: 'Hello, my name is' 
Class Probabilities: [nan, nan] (size=2)
------------------------------------------------------------
Prompt: 'The president of the United States is' 
Class Probabilities: [nan, nan] (size=2)
------------------------------------------------------------
Prompt: 'The capital of France is' 
Class Probabilities: [nan, nan] (size=2)
------------------------------------------------------------
Prompt: 'The future of AI is' 
Class Probabilities: [nan, nan] (size=2)
------------------------------------------------------------

it works after add to parameters to LLM arrgs

add dtype="bfloat16"
add max_num_batched_tokens=131072, which keeps consistent with the model used in the example jason9693/Qwen2.5-1.5B-apeach

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: chenfengjin <1871653365@qq.com>

gemini-code-assist

Code Review

This pull request addresses an issue in the classify.py example where it produced NaN probabilities. The fix, which involves setting dtype="bfloat16", is appropriate and resolves the problem. However, the change also introduces a hardcoded max_num_batched_tokens value of 131072. This is unrelated to the primary fix and could lead to out-of-memory errors on systems with less VRAM, undermining the goal of making the example work out-of-the-box. I recommend removing this line to ensure broader compatibility.

gemini-code-assist · 2025-09-16T13:04:30Z

examples/offline_inference/basic/classify.py

        model="jason9693/Qwen2.5-1.5B-apeach",
        runner="pooling",
        enforce_eager=True,
+        max_num_batched_tokens=131072,


While adding dtype="bfloat16" is the correct fix for the NaN issue, this line introducing max_num_batched_tokens seems unrelated to the main purpose of the PR. Setting this to a large value like 131072 could cause out-of-memory (OOM) errors for users with less VRAM, which would prevent the example from running 'out of the box'. It's better to let vLLM use its default value, which is dynamically determined based on the hardware. To keep this change focused and ensure broader compatibility, please consider removing this line.

max_num_batched_tokens is also neccessary as default 4096 is less than max_model_len, which result in errors.

Value error, max_num_batched_tokens (4096) is smaller than max_model_len (131072). This effectively limits the maximum sequence length to max_num_batched_tokens and makes vLLM reject longer sequences. Please increase max_num_batched_tokens or decrease max_model_len. [type=value_error, input_value=ArgsKwargs((), {'runner_t...ync_scheduling': False}), input_type=ArgsKwargs]

DarkLight1337

Let's actually change the model to internlm/internlm2-1_8b-reward, like in #24853

cc @noooop

noooop · 2025-09-16T14:29:17Z

Let's actually change the model to internlm/internlm2-1_8b-reward, like in #24853

Here is demonstrating the classify api, and internlm/internlm2-1_8b-reward is a reward model.

The current jason9693/Qwen2.5-1.5B-apeach should be the most suitable model. (and almost all classification-related APIs in CI use jason9693/Qwen2.5-1.5B-apeach. To be honest, I don't know why it's used, but it seems to have become a tradition in vllm

I did not encounter nan when running this example locally.

python examples/offline_inference/basic/classify.py

Generated Outputs:
------------------------------------------------------------
Prompt: 'Hello, my name is'
Class Probabilities: [0.2699090242385864, 0.7300909757614136] (size=2)
------------------------------------------------------------
Prompt: 'The president of the United States is'
Class Probabilities: [0.002644370775669813, 0.9973556995391846] (size=2)
------------------------------------------------------------
Prompt: 'The capital of France is'
Class Probabilities: [0.0021786594297736883, 0.9978213310241699] (size=2)
------------------------------------------------------------
Prompt: 'The future of AI is'
Class Probabilities: [0.03444216027855873, 0.9655578136444092] (size=2)
------------------------------------------------------------

Almost all classification-related APIs in CI use jason9693/Qwen2.5-1.5B-apeach, CI and myself have never encountered results being nan ???

@chenfengjin
Did you use other devices, not NVIDIA, AMD, or CPU. if this model has nan issues, CI should definitely report an error.

jason9693/Qwen2.5-1.5B-apeach defaults torch_dtype to float32.

From my experience, using bfloat16 inference for a float32 pooling model leads to a significant drop in numerical precision.Using float16 inference has almost no loss of precision.

add dtype to classify example so that it can be run out of box

1147840

Signed-off-by: chenfengjin <1871653365@qq.com>

mergify bot added the documentation Improvements or additions to documentation label Sep 16, 2025

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

Merge branch 'main' into fix_classify

b02e48a

chenfengjin changed the title ~~add dtype to classify example so that it can be run out of box~~ add dtype and max_num_batched_tokens to classify example so that it can be run out of box Sep 16, 2025

Merge branch 'main' into fix_classify

0e59e2d

DarkLight1337 reviewed Sep 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add dtype and max_num_batched_tokens to classify example so that it can be run out of box #24965

add dtype and max_num_batched_tokens to classify example so that it can be run out of box #24965

Uh oh!

chenfengjin commented Sep 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 16, 2025

Uh oh!

chenfengjin Sep 16, 2025 •

edited

Loading

Uh oh!

DarkLight1337 left a comment

Uh oh!

noooop commented Sep 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

add dtype and max_num_batched_tokens to classify example so that it can be run out of box #24965

Are you sure you want to change the base?

add dtype and max_num_batched_tokens to classify example so that it can be run out of box #24965

Uh oh!

Conversation

chenfengjin commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

chenfengjin Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

noooop commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

chenfengjin commented Sep 16, 2025 •

edited by github-actions bot

Loading

chenfengjin Sep 16, 2025 •

edited

Loading

noooop commented Sep 16, 2025 •

edited

Loading