-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better ARM intrinsics implementation for dn_simdhash #113095
Conversation
Tagging subscribers to this area: @hoyosjs |
The results are comparable with the scalar approach. This approach - Total AOT time: 47399 ms |
/azp run runtime-ioslike |
Pull request contains merge conflicts. |
/azp run runtime-ioslike |
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
/azp run runtime-android |
Azure Pipelines successfully started running 1 pipeline(s). |
Timings from an Ampere ARM64 VM with this PR:
vs the scalar implementation:
Based on this it seems like Apple's CPUs are just really good at scalar code, but the same is not necessarily true for ARM64 chips from other manufacturers. So it's worth landing this to improve performance on i.e. random Android devices even if it won't improve MacOS/iOS. |
/azp run runtime-android |
Azure Pipelines successfully started running 1 pipeline(s). |
Determined that using vector loads instead of byte loads generates smaller, faster ARM assembly, so we do that for ARM64 targets now. This means byte loads are used on x64 and any unsupported target, and vector loads are used on WASM and ARM64.
|
/azp run runtime-android |
Azure Pipelines successfully started running 1 pipeline(s). |
/backport to release/9.0-staging |
Started backporting to release/9.0-staging: https://github.com/dotnet/runtime/actions/runs/13708261750 |
@kg backporting to "release/9.0-staging" failed, the patch most likely resulted in conflicts: $ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch
Applying: New ARM NEON implementation of find_first_matching_suffix_simd
Using index info to reconstruct a base tree...
M src/native/containers/dn-simdhash-arch.h
Falling back to patching base and 3-way merge...
Auto-merging src/native/containers/dn-simdhash-arch.h
CONFLICT (content): Merge conflict in src/native/containers/dn-simdhash-arch.h
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0001 New ARM NEON implementation of find_first_matching_suffix_simd
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
See #113074
Speculative better implementation of dn_simdhash search for ARM NEON. I've examined the generated assembly and it looks right, but I don't have a way to test this locally right now so I'm testing it on CI.
This by necessity restricts NEON support to ARM64 (ARM32's vector registers are too narrow and not all the intrinsics I need exist on it). On ARM32 we should use scalar fallback.