AVX runtime float/double swizzle small improvement #1189

AntoinePrv · 2025-10-31T13:33:03Z

I have a number of of swizzle improvements to suggest, but I am starting small to get better accustomed to xsimd.

What do you make of the following change? My motivation was that _mm256_permute2f128_ps is the most expensive operation (though not sure if that's a problem in a CPU pipeline) so this PR suggest using it only once.

It also replaces modulo with a select mask to make sure this is properly optimized.

serge-sans-paille · 2025-10-31T13:52:10Z

Thanks! nice fine tuning \o/

AntoinePrv · 2025-10-31T17:44:16Z

Looks like AVX tests are good now, but I'm unsure what the remaining failure is.
👀 @serge-sans-paille @JohanMabille

serge-sans-paille · 2025-10-31T18:00:02Z

LGTM, I'll fix the emulated part, not something you should worry on. Would you mind squashing the last two commits?

serge-sans-paille · 2025-10-31T22:05:51Z

@AntoinePrv when you squash, you can also rebase on master which now contains a fix for the emulated part

DiamonDinoia · 2025-11-01T19:03:32Z

include/xsimd/arch/xsimd_avx.hpp

+            __m256 swapped = _mm256_permute2f128_ps(self, self, 0x01); // [high | low]

-            // normalize mask
-            batch<uint32_t, A> half_mask = mask % 4;


does this generate different asm actually?

PS I am fine either way. Might be worth having a normalize<value> API so that we can use everywhere that converts value to mask if it is a pow2

I'm unsure. For a regular integer operation I would say so but with intrinsics I had a doubt.

DiamonDinoia · 2025-11-01T19:07:21Z

I like this PR! Nice that you found an ulterior way to optimize this!

I would suggest that generating the blend mask and the normalization to be pure method or class method that we can use elsewhere. I think we might need them here and there.

This also allow to unit test them individually making debugging and development easier.

Again, this is just a suggestion. Feel free to ignore it.

serge-sans-paille · 2025-11-03T21:22:45Z

Merged as 9d41ad9 (once squashed)

AntoinePrv changed the title ~~AVX runtime float/double swizzle improvement~~ AVX runtime float/double swizzle small improvement Oct 31, 2025

AntoinePrv added 2 commits October 31, 2025 15:53

Remove one permute from swizzle float

f7912f0

Remove one permute from swizzle double

72f1073

AntoinePrv force-pushed the swizzle branch from ad5c382 to 72f1073 Compare October 31, 2025 14:53

Fix swizzle avx double type

3ded4ee

DiamonDinoia reviewed Nov 1, 2025

View reviewed changes

serge-sans-paille closed this Nov 3, 2025

AntoinePrv deleted the swizzle branch November 10, 2025 16:08

AntoinePrv mentioned this pull request Nov 10, 2025

Add 8/16 bits AVX2 swizzle #1197

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AVX runtime float/double swizzle small improvement #1189

AVX runtime float/double swizzle small improvement #1189

Uh oh!

AntoinePrv commented Oct 31, 2025

Uh oh!

serge-sans-paille commented Oct 31, 2025

Uh oh!

AntoinePrv commented Oct 31, 2025

Uh oh!

serge-sans-paille commented Oct 31, 2025

Uh oh!

serge-sans-paille commented Oct 31, 2025

Uh oh!

DiamonDinoia Nov 1, 2025 •

edited

Loading

Uh oh!

AntoinePrv Nov 10, 2025

Uh oh!

DiamonDinoia commented Nov 1, 2025

Uh oh!

serge-sans-paille commented Nov 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AVX runtime float/double swizzle small improvement #1189

AVX runtime float/double swizzle small improvement #1189

Uh oh!

Conversation

AntoinePrv commented Oct 31, 2025

Uh oh!

serge-sans-paille commented Oct 31, 2025

Uh oh!

AntoinePrv commented Oct 31, 2025

Uh oh!

serge-sans-paille commented Oct 31, 2025

Uh oh!

serge-sans-paille commented Oct 31, 2025

Uh oh!

DiamonDinoia Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AntoinePrv Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

DiamonDinoia commented Nov 1, 2025

Uh oh!

serge-sans-paille commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DiamonDinoia Nov 1, 2025 •

edited

Loading

serge-sans-paille commented Nov 3, 2025 •

edited

Loading