Skip to content

[Platform] Allow platform use V1 Engine by default #19792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2025

Conversation

wangxiyuan
Copy link
Contributor

@wangxiyuan wangxiyuan commented Jun 18, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Now more platform supports V1 Engine. This PR allow the platform to decide whether to using V1 Engine by default or not.

With this PR, user don't need to set VLLM_USE_V1=1 by hand any more if the platform works well on V1 by default.

Test Plan

This is a change mainly for oot platform. Origin test can make sure the logic is not breaked.

Test Result

(Optional) Documentation Update

this is a inner code change, no need for doc update.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @wangxiyuan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant improvement in how the V1 Engine is enabled by default across different hardware platforms. By modifying the platform interface to allow platforms to declare their default V1 Engine preference, I've made the system more flexible and user-friendly, eliminating the need for manual configuration in many common scenarios.

Highlights

  • Platform-driven V1 Engine Defaulting: I've refactored the V1 Engine compatibility check to allow each platform (e.g., CUDA, ROCm, TPU, CPU) to explicitly declare not only if it supports the V1 Engine, but also if the V1 Engine should be the default for that platform. This removes the previous hardcoded logic for determining V1 defaults based on hardware type.
  • Updated supports_v1 Interface: The supports_v1 class method in vllm/platforms/interface.py and its implementations (in cpu.py, cuda.py, rocm.py, tpu.py) now return a tuple (is_v1_support, is_v1_default) instead of a single boolean. This new return type provides granular control over V1 Engine defaulting.
  • Simplified User Experience: With this change, users on platforms that default to the V1 Engine will no longer need to manually set the VLLM_USE_V1=1 environment variable, streamlining the setup process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added rocm Related to AMD ROCm tpu Related to Google TPUs labels Jun 18, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors the supports_v1 method across platform interfaces and implementations to allow each platform to determine if V1 engine is supported and if it's the default. The changes are logical and well-contained. The primary area for improvement is ensuring consistency in type hints and docstrings for the modified supports_v1 methods in platform-specific files.

Comment on lines 476 to 483
def supports_v1(cls, model_config: ModelConfig) -> tuple[bool, bool]:
"""Returns whether the current platform can support v1 for the supplied
model configuration.

Returns:
tuple[bool, bool]: (is_v1_support, is_v1_default)
"""
return False
return False, False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CPU platform is a special case for partial v1 support, I think we can add an extra default_v1 method in interface, so that we don't need to touch other platform's implementation. WDYT?

    @classmethod
    def default_v1(cls, model_config: ModelConfig) -> tuple[bool, bool]:
        """Returns whether the current platform can use v1 by default for the supplied
        model configuration.
        """
        return cls.supports_v1(model_config)

For CPU platform, it will be like this:

    @classmethod
    def default_v1(cls, model_config: ModelConfig) -> tuple[bool, bool]:
        """Returns whether the current platform can use v1 by default for the supplied
        model configuration.
        """
        return cls.supports_v1(model_config) and cls.get_cpu_architecture() == CpuArchEnum.X86

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
@mergify mergify bot removed the tpu Related to Google TPUs label Jun 18, 2025
@Isotr0py Isotr0py removed the rocm Related to AMD ROCm label Jun 18, 2025
Copy link
Collaborator

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now!

@Isotr0py Isotr0py enabled auto-merge (squash) June 18, 2025 07:46
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 18, 2025
@Isotr0py Isotr0py merged commit 257ab95 into vllm-project:main Jun 18, 2025
79 checks passed
yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jun 22, 2025
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: minpeter <kali2005611@gmail.com>
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yang Wang <elainewy@meta.com>
gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Jun 26, 2025
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants