Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression of x86-64-specific SIMD intrinsics #138725

Closed
purplesyringa opened this issue Mar 20, 2025 · 4 comments
Closed

Performance regression of x86-64-specific SIMD intrinsics #138725

purplesyringa opened this issue Mar 20, 2025 · 4 comments
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. O-x86_64 Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64) regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@purplesyringa
Copy link
Contributor

use std::arch::x86_64::*;

pub fn mul_and_shift(a: __m128i, b: __m128i) -> __m128i {
    unsafe { _mm_srli_epi16(_mm_mulhi_epu16(a, b), 1) }
}

This should compile to, and Clang does compile this to:

mul_and_shift:
        pmulhuw xmm0, xmm1
        psrlw   xmm0, 1
        ret

But due to rust-lang/stdarch#1477, these intrinsics are mapped to portable SIMD operations, which are then compiled to this mess:

mul_and_shift:
        pmulhuw xmm0, xmm1
        punpcklwd       xmm1, xmm0
        punpckhwd       xmm0, xmm0
        psrld   xmm0, 17
        psrld   xmm1, 17
        packssdw        xmm1, xmm0
        movdqa  xmm0, xmm1
        ret

This is a regression in 1.75.0. IMO the corresponding parts of the stdarch PR should just be reverted, because I (and most people these days, I think) use specialized non-portable intrinsics precisely when LLVM can't optimize generic code correctly, and the PR explicitly breaks this use case. But I'd like to track this and hear other people's opinion.

@rustbot label +C-optimization +I-heavy +I-slow +A-SIMD +O-x86_64 +T-libs +regression-from-stable-to-stable

@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. A-SIMD Area: SIMD (Single Instruction Multiple Data) C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. O-x86_64 Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64) regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs Relevant to the library team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Mar 20, 2025
@purplesyringa purplesyringa changed the title Performance regression of x86-64 SIMD intrinsics Performance regression of x86-64-specific SIMD intrinsics Mar 20, 2025
@purplesyringa

This comment has been minimized.

@rustbot rustbot removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 20, 2025
@CatsAreFluffy
Copy link

Godbolt

@tgross35
Copy link
Contributor

#124216 has the same root problem. I'm going to close this as a duplicate, but the discussion about how to resolve the regressions should continue there (and it would be good to have your smaller example).

@nikic
Copy link
Contributor

nikic commented Mar 20, 2025

Upstream issue: llvm/llvm-project#132166

@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. O-x86_64 Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64) regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants