Skip to content

Conversation

bigPYJ1151
Copy link
Member

@bigPYJ1151 bigPYJ1151 commented Sep 18, 2025

Purpose

  • Noticed some issues about recently added oneDNN linear kernels on non-x86 platforms. Only enable it for x86 platform.

Fix #25155 #24976

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: jiang1.li <jiang1.li@intel.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to disable problematic oneDNN linear kernels on non-x86 platforms. The change correctly adds an architecture check for the unquantized GEMM path. However, the review identifies that this fix is likely incomplete, as the quantized oneDNN path does not seem to be covered by this change and may still cause issues on non-x86 platforms.

Comment on lines +170 to +171
elif (ops._supports_onednn
and current_platform.get_cpu_architecture() == CpuArchEnum.X86):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This fix appears to be incomplete. While it correctly disables the oneDNN kernel for the unquantized path on non-x86 platforms, the quantized path, which also utilizes oneDNN, seems to be unaddressed. If the underlying issue with oneDNN on non-x86 platforms is general, this omission could lead to incorrect behavior or crashes when running quantized models on those platforms. A similar architecture check should be implemented for the quantized oneDNN dispatch path to ensure a complete fix.

@jikunshang jikunshang enabled auto-merge (squash) September 19, 2025 05:54
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 19, 2025
@jikunshang jikunshang merged commit 8c1d4ac into vllm-project:main Sep 19, 2025
54 checks passed
debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025
Signed-off-by: jiang1.li <jiang1.li@intel.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: jiang1.li <jiang1.li@intel.com>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: charlifu <charlifu@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Installation]: run container by docker images failed, because of the oneDNN
2 participants