Skip to content

Conversation

noooop
Copy link
Contributor

@noooop noooop commented Sep 15, 2025

Purpose

Use jason9693/Qwen2.5-1.5B-apeach to demonstrate pooling api, but #20930 defaults to chunked prefill, while all pooling does not support chunked prefill, so the encode task is disabled.

So this demonstration will output an Error

Pooling Response:
{'error': {'code': 400,
           'message': 'The model does not support Pooling API',
           'param': None,
           'type': 'BadRequestError'}}

Use internlm/internlm2-1_8b-reward to demonstrate that pooling api is more suitable

address #24650 (comment)

  • Put pooling related examples into a separate folder for easy access by users. ( Other examples might also need to be organized into folders, but I'm not very familiar with these examples.

  • Verify that the output of all other pooling examples is correct.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop noooop requested a review from hmellor as a code owner September 15, 2025 06:14
@mergify mergify bot added documentation Improvements or additions to documentation qwen Related to Qwen models labels Sep 15, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully fixes a broken example in openai_pooling_client.py and improves the project structure by reorganizing pooling-related examples into a dedicated pooling subdirectory. The changes are generally solid. However, I've identified a few inconsistencies in the newly added example commands within docstrings. Specifically, the --runner pooling flag is missing in several places. While vLLM might auto-detect the correct runner, explicitly including this flag would make the examples more robust, consistent with the documentation, and less error-prone for users who might adapt them for other models. I've provided specific suggestions to address this.

Signed-off-by: wang.yuqi <noooop@126.com>
@DarkLight1337
Copy link
Member

These example files no longer exist in the doc preview. cc @hmellor do you know why this happens?

@noooop noooop changed the title [Misc] Fix openai_pooling_client.py examples [Misc] Fix examples openai_pooling_client.py Sep 15, 2025
@hmellor
Copy link
Member

hmellor commented Sep 15, 2025

The generation of multi-file examples was hand written so does not cover every corner case.

For this case you just have to add a README.md in each subfolder. Each script will then appear in the Example materials section of the documentation page.

Here is an example of a multi-file example with a readme https://github.com/vllm-project/vllm/tree/main/examples/online_serving/chart-helm and its corresponding docs page which:

  • Links to the directory on GitHub
  • Includes the content of the README
  • lists all the files that are not the README in expandable admonitions (the titles of these admonitions are the file names, so make sure they're informative!)

@noooop
Copy link
Contributor Author

noooop commented Sep 15, 2025

  • lists all the files that are not the README in expandable admonitions (the titles of these admonitions are the file names, so make sure they're informative!)

I shouldn't touch these files QvQ

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you updated the links to these examples in the docs, could you also verify that either:

  • None of these examples are executed in testing
  • Any examples that are executed in testing have their paths updated too

@noooop
Copy link
Contributor Author

noooop commented Sep 15, 2025

  • None of these examples are executed in testing

As far as I know, none of these examples are executed in testing

vllm ci is very complicated, please help me do a double check

Signed-off-by: wang.yuqi <noooop@126.com>
@hmellor
Copy link
Member

hmellor commented Sep 15, 2025

If none of the example appear in

- label: Examples Test # 30min
timeout_in_minutes: 45
mirror_hardwares: [amdexperimental]
working_dir: "/vllm-workspace/examples"
source_file_dependencies:
- vllm/entrypoints
- examples/
commands:
- pip install tensorizer # for tensorizer test
- python3 offline_inference/basic/generate.py --model facebook/opt-125m
- python3 offline_inference/basic/generate.py --model meta-llama/Llama-2-13b-chat-hf --cpu-offload-gb 10
- python3 offline_inference/basic/chat.py
- python3 offline_inference/prefix_caching.py
- python3 offline_inference/llm_engine_example.py
- python3 offline_inference/audio_language.py --seed 0
- python3 offline_inference/vision_language.py --seed 0
- python3 offline_inference/vision_language_pooling.py --seed 0
- python3 offline_inference/vision_language_multi_image.py --seed 0
- VLLM_USE_V1=0 python3 others/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 others/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0
- python3 offline_inference/basic/classify.py
- python3 offline_inference/basic/embed.py
- python3 offline_inference/basic/score.py
- VLLM_USE_V1=0 python3 offline_inference/profiling.py --model facebook/opt-125m run_num_steps --num-steps 2
I think we're good

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you shorten the top level headings? We are already in the example section for online/offline examples. Shortening the titles improves readablility in the nav drawer

noooop and others added 3 commits September 15, 2025 18:35
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for consolidating these examples!

@hmellor hmellor enabled auto-merge (squash) September 15, 2025 11:25
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 15, 2025
@hmellor hmellor merged commit bf214ca into vllm-project:main Sep 15, 2025
27 of 29 checks passed
@noooop noooop deleted the fix_pooling_examples branch September 15, 2025 13:08
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants