-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Optimize find
for ARM64EC
#5597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Here is a technical note written in 2024 about ARM64EC and x64 emulation. For writing C++ code, the key points are: ARM64EC is aarch64 code. However, during preprocessing, it masquerades as x64 through macros, not ARM64. Therefore, if anyone wants to optimize ARM64EC performance, they only need to be careful not to use AMD64-specific vector intrinsics, as these are provided by softintrin.lib. |
Thanks, I now see that ARM64 intrinsics are the way to go, also no AVX+ intrinsics are expected. Still, I'm not sure how bad the emulation is, so not sure if emulated SSE4.2 is neccessarily worse than native scalar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR aims to optimize the find algorithms for the ARM64EC platform by using the optimized C-library functions memchr and wmemchr.
- Introduces an ARM64EC-specific branch in __std_find_trivial_1 using memchr.
- Introduces an ARM64EC-specific branch in __std_find_trivial_2 using wmemchr with type casts and a modified element count.
Also, we don't need to static_cast from uint16_t to wchar_t. That's a value-preserving integral conversion.
I'm not completely sure what exactly ARM64EC is (#2740).
But the upstream
memchr
/wmemchr
are apparently properly optimized for it.I didn't benchmark or test this change. It seems to compile though.