Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM] Use temporary variable when loading more than 8 bits in chunkmemset_neon(). #927

Merged
merged 1 commit into from
Apr 14, 2021

Conversation

mtl1979
Copy link
Collaborator

@mtl1979 mtl1979 commented Apr 10, 2021

  • ARMv7 does not support unaligned loads to vector registers from memory wider than 8 bits.

See #925.

@codecov
Copy link

codecov bot commented Apr 10, 2021

Codecov Report

Merging #927 (24024a5) into develop (b82d349) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #927   +/-   ##
========================================
  Coverage    75.85%   75.86%           
========================================
  Files           74       74           
  Lines         8292     8294    +2     
  Branches      1369     1369           
========================================
+ Hits          6290     6292    +2     
  Misses        1472     1472           
  Partials       530      530           
Flag Coverage Δ
macos_clang 68.65% <ø> (ø)
macos_gcc 67.64% <ø> (ø)
ubuntu_clang 70.27% <ø> (ø)
ubuntu_clang_debug 69.73% <ø> (ø)
ubuntu_clang_inflate_allow_invalid_dist 70.07% <ø> (ø)
ubuntu_clang_inflate_strict 70.33% <ø> (ø)
ubuntu_clang_mmap 70.04% <ø> (ø)
ubuntu_clang_msan 70.33% <ø> (ø)
ubuntu_gcc 70.31% <100.00%> (+<0.01%) ⬆️
ubuntu_gcc_aarch64 70.27% <100.00%> (ø)
ubuntu_gcc_aarch64_compat_no_opt 67.97% <ø> (ø)
ubuntu_gcc_aarch64_no_acle 68.52% <100.00%> (ø)
ubuntu_gcc_aarch64_no_neon 68.21% <ø> (ø)
ubuntu_gcc_armhf 70.69% <100.00%> (ø)
ubuntu_gcc_armhf_compat_no_opt 67.97% <ø> (ø)
ubuntu_gcc_armhf_no_acle 69.81% <100.00%> (ø)
ubuntu_gcc_armhf_no_neon 70.12% <ø> (ø)
ubuntu_gcc_armsf 70.69% <100.00%> (+0.01%) ⬆️
ubuntu_gcc_armsf_compat_no_opt 67.97% <ø> (ø)
ubuntu_gcc_compat_no_opt 69.36% <ø> (ø)
ubuntu_gcc_mingw_i686 0.00% <ø> (ø)
ubuntu_gcc_mingw_x86_64 0.00% <ø> (ø)
ubuntu_gcc_no_avx2 69.65% <ø> (ø)
ubuntu_gcc_no_pclmulqdq 67.78% <ø> (ø)
ubuntu_gcc_no_sse2 68.97% <ø> (ø)
ubuntu_gcc_no_sse4 67.74% <ø> (ø)
ubuntu_gcc_o3 69.36% <ø> (ø)
ubuntu_gcc_osb 71.79% <ø> (ø)
ubuntu_gcc_ppc 71.13% <ø> (ø)
ubuntu_gcc_ppc64 70.32% <ø> (ø)
ubuntu_gcc_ppc64le 69.31% <ø> (ø)
ubuntu_gcc_s390x 68.49% <ø> (ø)
ubuntu_gcc_sparc64 71.03% <ø> (ø)
win64_gcc 72.82% <ø> (ø)
win64_gcc_compat_no_opt 74.16% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
arch/arm/chunkset_neon.c 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b82d349...24024a5. Read the comment docs.

…mset_neon().

* using memcpy() forbids optimizer to optimize away the temporary variable due to aliasing rules.
@mtl1979
Copy link
Collaborator Author

mtl1979 commented Apr 11, 2021

This is how the disassembly of the two relevant functions look now with gcc trunk:

chunkmemset_2(unsigned char*, __simd128_uint8_t*):
 ldrh r3, [r0, #0]
 vdup.16 q8, r3
 vst1.64 {d16-d17}, [r1 :64]
 bx lr
chunkmemset_4(unsigned char*, __simd128_uint8_t*):
 ldr r3, [r0, #0]
 vdup.32 q8, r3
 vst1.64 {d16-d17}, [r1 :64]
 bx lr

bx lr is same as ret on x86 assembly. r0 is from, r1 is chunk and r3is tmp. I disabled inlining so the whole function doesn't get optimized away.

@mtl1979 mtl1979 requested a review from nmoinvaz April 11, 2021 18:36
Copy link
Member

@nmoinvaz nmoinvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assembly looks exactly like what we would want. Description in commit has good explanation. Nice job!

Copy link

@kdrag0n kdrag0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the issue for me without changing the generated code on 64-bit ARMv8.

@Dead2 Dead2 merged commit deda158 into zlib-ng:develop Apr 14, 2021
Dead2 added a commit that referenced this pull request May 8, 2021
- Include porting guide in release packages #917
- Documentation improvements #913 #949
- Added Windows ARM binaries in release packages #916
- Fix crash on ARMv7 #927
- Fix building on FreeBSD #921
- Fix building with musl on aarch64 #936
- Fix ARM float-abi detection #918
- Fix cmake detection of risc-v architectures #942
- Minor buildsystem fixes #922 #924 #933 #938 #950
- Improve zlib-compat build #915 #944
- CI/Test improvements #926 #929 #925 #937 #939 #940
Dead2 added a commit that referenced this pull request May 8, 2021
- Include porting guide in release packages #917
- Documentation improvements #913 #949
- Added Windows ARM binaries in release packages #916
- Fix crash on ARMv7 #927
- Fix building on FreeBSD #921
- Fix building with musl on aarch64 #936
- Fix ARM float-abi detection #918
- Fix cmake detection of risc-v architectures #942
- Minor buildsystem fixes #922 #924 #933 #938 #950
- Improve zlib-compat build #915 #944
- CI/Test improvements #926 #929 #927 #937 #939 #940
@Dead2 Dead2 mentioned this pull request May 8, 2021
Dead2 added a commit that referenced this pull request May 8, 2021
- Include porting guide in release packages #917
- Documentation improvements #913 #949
- Added Windows ARM binaries in release packages #916
- Fix crash on ARMv7 #927
- Fix building on FreeBSD #921
- Fix building with musl on aarch64 #936
- Fix ARM float-abi detection #918
- Fix cmake detection of risc-v architectures #942
- Minor buildsystem fixes #922 #924 #933 #938 #950
- Improve zlib-compat build #915 #944
- CI/Test improvements #926 #929 #927 #937 #939 #940
Dead2 added a commit that referenced this pull request May 9, 2021
- Include porting guide in release packages #917
- Documentation improvements #913 #949
- Added Windows ARM binaries in release packages #916
- Fix crash on ARMv7 #927
- Fix building on FreeBSD #921
- Fix building with musl on aarch64 #936 #952
- Fix ARM float-abi detection #918
- Fix cmake detection of risc-v architectures #942
- Minor buildsystem fixes #922 #924 #933 #938 #950
- Improve zlib-compat build #915 #944
- CI/Test improvements #926 #929 #927 #937 #939 #940
Dead2 added a commit that referenced this pull request May 13, 2021
- Include porting guide in release packages #917
- Documentation improvements #913 #949
- Added Windows ARM binaries in release packages #916
- Fix crash on ARMv7 #927
- Fix building on FreeBSD #921
- Fix building with musl on aarch64 #936 #952
- Fix ARM float-abi detection #918
- Fix cmake detection of risc-v architectures #942
- Minor buildsystem fixes #922 #924 #933 #938 #950
- Improve zlib-compat build #915 #944
- CI/Test improvements #926 #929 #927 #937 #939 #940
gsjaardema pushed a commit to gsjaardema/zlib-ng that referenced this pull request May 13, 2021
- Include porting guide in release packages zlib-ng#917
- Documentation improvements zlib-ng#913 zlib-ng#949
- Added Windows ARM binaries in release packages zlib-ng#916
- Fix crash on ARMv7 zlib-ng#927
- Fix building on FreeBSD zlib-ng#921
- Fix building with musl on aarch64 zlib-ng#936 zlib-ng#952
- Fix ARM float-abi detection zlib-ng#918
- Fix cmake detection of risc-v architectures zlib-ng#942
- Minor buildsystem fixes zlib-ng#922 zlib-ng#924 zlib-ng#933 zlib-ng#938 zlib-ng#950
- Improve zlib-compat build zlib-ng#915 zlib-ng#944
- CI/Test improvements zlib-ng#926 zlib-ng#929 zlib-ng#927 zlib-ng#937 zlib-ng#939 zlib-ng#940
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants