-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Fix] Improve CPU backend compatibility for RISC-V #25816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] Improve CPU backend compatibility for RISC-V #25816
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses a crash on RISC-V architectures by improving CPU backend compatibility. The changes are well-structured into two parts: proactively disabling the chunked_prefill
feature for RISC-V with a clear warning, and adding a safeguard to raise a NotImplementedError
if the feature is used without its required intel_extension_for_pytorch
dependency. Both changes are implemented correctly and improve the robustness and user experience of vLLM on non-x86 platforms. The code is clean and the logic is sound. Great work!
28268a4
to
b1d7bcf
Compare
Hi @hmellor,
|
b1d7bcf
to
536a0a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for being receptive to my feedback.
I'm still not sure if these changes are necessary though. It's not clear to me why they're needed.
536a0a6
to
00caf1c
Compare
Hi [@hmellor], |
8d44d6a
to
40a0083
Compare
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com>
40a0083
to
cf276f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for fixing RISC-V
Great! Thanks for your review and guidance. |
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com>
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Purpose
Fixes #25737
This PR aims to fix crashes and improve the compatibility of vLLM's CPU backend when running on the RISC-V architecture. It addresses two specific issues:
IPEX Dependency Crash: The
chunked_prefill
feature in the CPU attention backend unconditionally importsintel_extension_for_pytorch
, causing aModuleNotFoundError
on non-x86 platforms. This PR fixes this by guarding the import with the existing_use_ipex
flag and raising aNotImplementedError
if the feature is used without its dependency.Proactive Disabling for RISC-V: To improve the user experience, this PR also adds
riscv64
to the platform exclusion list for thechunked_prefill
feature. This provides a clear warning at startup and prevents users from attempting to use a feature that is known to be unsupported on their hardware, similar to the existing handling for ARM and POWER architectures.Together, these changes allow vLLM to initialize and run on RISC-V CPUs without crashing due to these architecture-specific dependencies.
Test Plan
Environment
The fix was tested in the following RISC-V environment:
Test Command
Run any vLLM process that uses the CPU backend, for example, the latency benchmark:
Test Result
Before this PR
The vLLM engine crashes during initialization with a
ModuleNotFoundError
because it tries to importintel_extension_for_pytorch
on a RISC-V machine.After this PR
The vLLM engine now starts successfully. A warning is logged to the console indicating that
chunked_prefill
has been disabled for the RISC-V platform, and the program proceeds to run without crashing. The logged warning looks like this: