Skip to content

Conversation

@serge-sans-paille
Copy link
Contributor

No description provided.

@cyb70289
Copy link
Contributor

Turns out AVX2 is always enabled in the source code
https://github.com/xtensor-stack/xsimd/blob/master/test/CMakeLists.txt#L92
@serge-sans-paille

@serge-sans-paille
Copy link
Contributor Author

thanks @cyb70289 . I now have a clean reproducer for your bug, is that okay if I cherry-pick your fix in this branch and adjust it?

@cyb70289
Copy link
Contributor

Sure, no problem

@serge-sans-paille
Copy link
Contributor Author

All green. Thanks @cyb70289 for the hints! and cc @JohanMabille for the merge.

Comment on lines 241 to 243
#if defined(_M_X64) || (defined(_M_IX86_FP) && _M_IX86_FP >= 2)
#define XSIMD_WITH_SSE2 1
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only enable SSE2 here, then SSE3,SSE4 will not be supported by xsimd if built with msvc.
I think this is a problem and will still break many code.
MSVC X64 must support SSE4_2 (actually /arch:SSE2 is an invalid option for msvc x64).
What about define XSIMD_WITH_SSE4_2 if _M_X64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the doc there's indeed no explicit flag for SSE3 and later (based on https://docs.microsoft.com/fr-fr/cpp/build/reference/arch-x86?view=msvc-160) but I wonder if it's actually supported if no arch flag is specified ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X64 msvc only supports /arch:AVX?, https://docs.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-160

My understanding is /arch:XXX anables msvc to auto vectorize c++ code using at most the XXX instructions.
But it doesn't prevent us from using intrinsics directly. Compiler should supports all level simd instructions.

Copy link
Contributor

@cyb70289 cyb70289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serge-sans-paille
Copy link
Contributor Author

@cyb70289 unfortunately, I'm not able to reproduce the windows error locally (on xsimd::extract_pair) could you have a look?

batch<T, A> extract_pair(batch<T, A> const& self, batch<T, A> const& other, std::size_t i, ::xsimd::detail::index_sequence<I, Is...>) {
if(i == I) {
return _mm_alignr_epi8(self, other, sizeof(T) * I);
return _mm_alignr_epi8(other, self, sizeof(T) * (I));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other and self swapped?

@serge-sans-paille
Copy link
Contributor Author

serge-sans-paille commented Sep 24, 2021 via email

@serge-sans-paille
Copy link
Contributor Author

@cyb70289 can you confirm e.g. on your code base that current implementation is correct, i.e. that the tests are well-written?

@cyb70289
Copy link
Contributor

@cyb70289 can you confirm e.g. on your code base that current implementation is correct, i.e. that the tests are well-written?

I use extract_pair in my simd code. It works correctly on both x86 and arm.
So I guess maybe xsimd test is not correct. Will have a look at my side.

@cyb70289
Copy link
Contributor

Looks to me this test code to generate expected result is wrong:
https://github.com/xtensor-stack/xsimd/blob/master/test/test_extract_pair.cpp#L35-L42

_mm_alignr_epi8(lhs, rhs, 1) should return (from lsb to msb) rhs[1] rhs[2] ... rhs[15] lhs[0],
but the test code expects lhs[1] lhs[2] ... lhs[15] rhs[0].
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_alignr_epi8&expand=302

if (n == I)
{
return vextq_u8(lhs, rhs, I);
return vextq_u8(rhs, lhs, I);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, vextq_xx is indeed different from _mm_alignr_xx 👍

@serge-sans-paille
Copy link
Contributor Author

All green! cc @JohanMabille this one costed me quite a lot of effort :-)

@JohanMabille
Copy link
Member

Congratulation on this one!

@JohanMabille JohanMabille merged commit aa54369 into master Sep 27, 2021
@JohanMabille JohanMabille deleted the fix/test-sse2-windows branch September 27, 2021 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants