Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure on ARM v8 with SVE (neoverse_v1) architecture #1005

Closed
casparvl opened this issue Feb 27, 2024 · 4 comments
Closed

Build failure on ARM v8 with SVE (neoverse_v1) architecture #1005

casparvl opened this issue Feb 27, 2024 · 4 comments

Comments

@casparvl
Copy link

casparvl commented Feb 27, 2024

Environment

  • GCC 13.2.0
  • ARM Neoverse V1 (in an AWS instance)
  • GCC build flags:
-ftree-vectorize -mcpu=native -fno-math-errno -O3 -DNDEBUG -std=c++17 -fPIC  -march=native
  • xsimd version: 12.1.1

Error
I'm running into a build issue when compiling code that uses xsimd:

/tmp/bot/easybuild/build/DP3/6.0/foss-2023b/DP3/antennaflagger/Flagger.cc:116:66:   required from here
/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/neoverse_v1/software/IDG/1.2.0-foss-2023b/include/xsimd/arch/xsimd_neon.hpp:943:36: error: could not convert dispatcher.xsimd::kernel::detail::neon_dispatcher_base<xsimd::kernel::detail::comp_return_type, __Uint8x16_t, __Int8x16_t, __Uint16x8_t, __Int16x8_t, __Uint32x4_t, __Int32x4_t, __Float32x4_t>::binary::apply<__Float32x4_t>((& lhs)->xsimd::batch<float, xsimd::i8mm<xsimd::neon64> >::<anonymous>.xsimd::types::simd_register<float, xsimd::i8mm<xsimd::neon64> >::<anonymous>.xsimd::types::simd_register<float, xsimd::neon64>::<anonymous>.xsimd::types::simd_register<float, xsimd::neon>::operator register_type(), (& rhs)->xsimd::batch<float, xsimd::i8mm<xsimd::neon64> >::<anonymous>.xsimd::types::simd_register<float, xsimd::i8mm<xsimd::neon64> >::<anonymous>.xsimd::types::simd_register<float, xsimd::neon64>::<anonymous>.xsimd::types::simd_register<float, xsimd::neon>::operator register_type()) from xsimd::kernel::detail::comp_return_type<__Float32x4_t> {aka uint32x4_t} to xsimd::batch_bool<float, xsimd::i8mm<xsimd::neon64> >
  943 |             return dispatcher.apply(register_type(lhs), register_type(rhs));
      |                    ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                    |
      |                                    xsimd::kernel::detail::comp_return_type<__Float32x4_t> {aka uint32x4_t}

The same code, with the same compiler flags / compiler version builds fine on Neoverse N1 (and on zen2, zen3, haswell and skylake by the way).

I've tried to dig into the code of xsimd a bit, but in the above error I'm in a bit over my head when it comes to all the types flying around :) Hoping that someone with more expertise in xsimd spots where this might be going wrong... My bet is there was some change in terms of datatypes, intrinsics, or similar in Neoverse V1 that was not accounted for (yet) in xsimd that makes this go wrong compared to e.g. Neoverse N1.

Not sure if this might be useful, but to get an overview of the supported instructions on N1 vs V1, on Neoverse N1:

$ lscpu | grep Flags
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

And for Neoverse V1:

$ lscpu | grep Flags
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs dcpodp svei8mm svebf16 i8mm bf16 dgh rng

N.B. Note that this none of these codes is mine: I'm just the guy having the pleasure of trying to build them on different hardware architectures :)

@casparvl casparvl changed the title Build failure on ARM v8 with SVE (neoverser_v1) architecture Build failure on ARM v8 with SVE (neoverse_v1) architecture Feb 27, 2024
@serge-sans-paille
Copy link
Contributor

serge-sans-paille commented Feb 27, 2024

Thanks for the bug report! Can you share a minimal c++ input that fails?
I'd also be interested in the output of g++ -v [extra flags] failing_input.cpp

@casparvl
Copy link
Author

Thanks for the fast response. I'd love to provide a minimal c++ input, maybe you can help me create one since I'm unfamiliar with xsimd... The call where it fails is this one. Can you give me a minimal piece of code that would trigger this dispatcher.apply(...) call? I found https://xsimd.readthedocs.io/en/latest/api/dispatching.html but the code snippet there is not a fully contained example. I think the only thing it is lacking is to provide it with some actual data object before calling float res = dispatched(data, 17), but wasn't sure what that data object should be exactly... If you could help me with turning that into a fully contained example, I can try running it, and I'm giving it a 9/10 chance that will trigger the bug.

Regarding the -v run: let's try that minimal example first, see if we can get rid of the complexity of all the related code. The -v for that will probably be much cleaner.

@serge-sans-paille
Copy link
Contributor

This looks like the right reproducer: https://godbolt.org/z/E4KxKqcMP
I'll investigate some more

@casparvl
Copy link
Author

Good thinking on using godbolt for this, that was much quicker than going back and forth with me trying out a compilation natively :) Thanks for investigating this and the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants