Skip to content

Optimize find for ARM64EC #5597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

AlexGuteniev
Copy link
Contributor

I'm not completely sure what exactly ARM64EC is (#2740).
But the upstream memchr / wmemchr are apparently properly optimized for it.

I didn't benchmark or test this change. It seems to compile though.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner June 16, 2025 06:31
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Jun 16, 2025
@YexuanXiao
Copy link
Contributor

Here is a technical note written in 2024 about ARM64EC and x64 emulation. For writing C++ code, the key points are: ARM64EC is aarch64 code. However, during preprocessing, it masquerades as x64 through macros, not ARM64. Therefore, if anyone wants to optimize ARM64EC performance, they only need to be careful not to use AMD64-specific vector intrinsics, as these are provided by softintrin.lib.

@AlexGuteniev
Copy link
Contributor Author

Here is a technical note written in 2024 about ARM64EC and x64 emulation

Thanks, I now see that ARM64 intrinsics are the way to go, also no AVX+ intrinsics are expected.

Still, I'm not sure how bad the emulation is, so not sure if emulated SSE4.2 is neccessarily worse than native scalar.

@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Jun 16, 2025
@StephanTLavavej StephanTLavavej self-assigned this Jun 16, 2025
@StephanTLavavej StephanTLavavej requested a review from Copilot June 27, 2025 14:25
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR aims to optimize the find algorithms for the ARM64EC platform by using the optimized C-library functions memchr and wmemchr.

  • Introduces an ARM64EC-specific branch in __std_find_trivial_1 using memchr.
  • Introduces an ARM64EC-specific branch in __std_find_trivial_2 using wmemchr with type casts and a modified element count.

Also, we don't need to static_cast from uint16_t to wchar_t. That's a value-preserving integral conversion.
@StephanTLavavej StephanTLavavej removed their assignment Jun 27, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM64 Related to the ARM64 architecture performance Must go faster
Projects
Status: Ready To Merge
Development

Successfully merging this pull request may close these issues.

3 participants