Optimize `find` for ARM64EC #5597

AlexGuteniev · 2025-06-16T06:31:39Z

I'm not completely sure what exactly ARM64EC is (#2740).
But the upstream memchr / wmemchr are apparently properly optimized for it.

I didn't benchmark or test this change. It seems to compile though.

YexuanXiao · 2025-06-16T11:47:15Z

Here is a technical note written in 2024 about ARM64EC and x64 emulation. For writing C++ code, the key points are: ARM64EC is aarch64 code. However, during preprocessing, it masquerades as x64 through macros, not ARM64. Therefore, if anyone wants to optimize ARM64EC performance, they only need to be careful not to use AMD64-specific vector intrinsics, as these are provided by softintrin.lib.

AlexGuteniev · 2025-06-16T13:57:49Z

Here is a technical note written in 2024 about ARM64EC and x64 emulation

Thanks, I now see that ARM64 intrinsics are the way to go, also no AVX+ intrinsics are expected.

Still, I'm not sure how bad the emulation is, so not sure if emulated SSE4.2 is neccessarily worse than native scalar.

Copilot

Pull Request Overview

This PR aims to optimize the find algorithms for the ARM64EC platform by using the optimized C-library functions memchr and wmemchr.

Introduces an ARM64EC-specific branch in __std_find_trivial_1 using memchr.
Introduces an ARM64EC-specific branch in __std_find_trivial_2 using wmemchr with type casts and a modified element count.

stl/src/vector_algorithms.cpp

Also, we don't need to static_cast from uint16_t to wchar_t. That's a value-preserving integral conversion.

Optimize find for ARM64EC

59de6de

AlexGuteniev requested a review from a team as a code owner June 16, 2025 06:31

github-project-automation bot added this to STL Code Reviews Jun 16, 2025

github-project-automation bot moved this to Initial Review in STL Code Reviews Jun 16, 2025

StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Jun 16, 2025

StephanTLavavej self-assigned this Jun 16, 2025

StephanTLavavej requested a review from Copilot June 27, 2025 14:25

Copilot AI reviewed Jun 27, 2025

View reviewed changes

stl/src/vector_algorithms.cpp Outdated Show resolved Hide resolved

Fix wmemchr() count.

def468e

Also, we don't need to static_cast from uint16_t to wchar_t. That's a value-preserving integral conversion.

StephanTLavavej approved these changes Jun 27, 2025

View reviewed changes

StephanTLavavej removed their assignment Jun 27, 2025

StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Jun 27, 2025

StephanTLavavej mentioned this pull request Jun 27, 2025

ARM64EC: What even is it? #2740

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize `find` for ARM64EC #5597

Optimize `find` for ARM64EC #5597

AlexGuteniev commented Jun 16, 2025

Uh oh!

YexuanXiao commented Jun 16, 2025

Uh oh!

AlexGuteniev commented Jun 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Optimize find for ARM64EC #5597

Are you sure you want to change the base?

Optimize find for ARM64EC #5597

Conversation

AlexGuteniev commented Jun 16, 2025

Uh oh!

YexuanXiao commented Jun 16, 2025

Uh oh!

AlexGuteniev commented Jun 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Optimize `find` for ARM64EC #5597

Optimize `find` for ARM64EC #5597