Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions vllm/model_executor/layers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from vllm import _custom_ops as ops
from vllm import envs
from vllm.platforms import current_platform
from vllm.platforms import CpuArchEnum, current_platform
from vllm.utils import direct_register_custom_op


Expand Down Expand Up @@ -167,7 +167,8 @@ def dispatch_cpu_unquantized_gemm(
if remove_weight:
layer.weight = torch.nn.Parameter(torch.empty(0),
requires_grad=False)
elif ops._supports_onednn:
elif (ops._supports_onednn
and current_platform.get_cpu_architecture() == CpuArchEnum.X86):
Comment on lines +170 to +171
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This fix appears to be incomplete. While it correctly disables the oneDNN kernel for the unquantized path on non-x86 platforms, the quantized path, which also utilizes oneDNN, seems to be unaddressed. If the underlying issue with oneDNN on non-x86 platforms is general, this omission could lead to incorrect behavior or crashes when running quantized models on those platforms. A similar architecture check should be implemented for the quantized oneDNN dispatch path to ensure a complete fix.

origin_weight = layer.weight
if remove_weight:
layer.weight = torch.nn.Parameter(torch.empty(0),
Expand Down