Performance regression of x86-64-specific SIMD intrinsics #138725

purplesyringa · 2025-03-20T00:57:06Z

use std::arch::x86_64::*;

pub fn mul_and_shift(a: __m128i, b: __m128i) -> __m128i {
    unsafe { _mm_srli_epi16(_mm_mulhi_epu16(a, b), 1) }
}

This should compile to, and Clang does compile this to:

mul_and_shift:
        pmulhuw xmm0, xmm1
        psrlw   xmm0, 1
        ret

But due to rust-lang/stdarch#1477, these intrinsics are mapped to portable SIMD operations, which are then compiled to this mess:

mul_and_shift:
        pmulhuw xmm0, xmm1
        punpcklwd       xmm1, xmm0
        punpckhwd       xmm0, xmm0
        psrld   xmm0, 17
        psrld   xmm1, 17
        packssdw        xmm1, xmm0
        movdqa  xmm0, xmm1
        ret

This is a regression in 1.75.0. IMO the corresponding parts of the stdarch PR should just be reverted, because I (and most people these days, I think) use specialized non-portable intrinsics precisely when LLVM can't optimize generic code correctly, and the PR explicitly breaks this use case. But I'd like to track this and hear other people's opinion.

@rustbot label +C-optimization +I-heavy +I-slow +A-SIMD +O-x86_64 +T-libs +regression-from-stable-to-stable

CatsAreFluffy · 2025-03-20T01:06:31Z

Godbolt

tgross35 · 2025-03-20T01:58:39Z

#124216 has the same root problem. I'm going to close this as a duplicate, but the discussion about how to resolve the regressions should continue there (and it would be good to have your smaller example).

nikic · 2025-03-20T08:45:24Z

Upstream issue: llvm/llvm-project#132166

rustbot added needs-triage A-SIMD C-optimization I-heavy I-slow O-x86_64 regression-from-stable-to-stable T-libs I-prioritize labels Mar 20, 2025

purplesyringa changed the title ~~Performance regression of x86-64 SIMD intrinsics~~ Performance regression of x86-64-specific SIMD intrinsics Mar 20, 2025

This comment has been minimized.

Sign in to view

rustbot removed the needs-triage label Mar 20, 2025

tgross35 closed this as completed Mar 20, 2025

apiraino removed the I-prioritize label Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression of x86-64-specific SIMD intrinsics #138725

Performance regression of x86-64-specific SIMD intrinsics #138725

purplesyringa commented Mar 20, 2025

This comment has been minimized.

CatsAreFluffy commented Mar 20, 2025

tgross35 commented Mar 20, 2025

nikic commented Mar 20, 2025

Performance regression of x86-64-specific SIMD intrinsics #138725

Performance regression of x86-64-specific SIMD intrinsics #138725

Comments

purplesyringa commented Mar 20, 2025

This comment has been minimized.

CatsAreFluffy commented Mar 20, 2025

tgross35 commented Mar 20, 2025

nikic commented Mar 20, 2025