Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use unaligned access for memcpy instructions #1309

Merged
merged 1 commit into from
Aug 17, 2022

Conversation

nmoinvaz
Copy link
Member

No description provided.

@nmoinvaz nmoinvaz added the bug label Jun 29, 2022
@phprus
Copy link
Contributor

phprus commented Jun 29, 2022

Compare without memcmp (see: eq_uint64_t_tmp):
https://godbolt.org/z/TavGsbTh9

GCC, Clang and MSVC does not call memcpy.
But I don't know if eq_uint64_t_tmp function will inlining.

On ARM and GCC < 9 there will be calls to memcmp.

@nmoinvaz
Copy link
Member Author

@phprus the __attribute__((__may_alias__)) would be a good idea for a separate PR, I can see it definitely helps on old GCC.

@phprus
Copy link
Contributor

phprus commented Jun 29, 2022

@nmoinvaz
On GCC __may_alias__ and aligned(1) (to decrease alignment).

See link to stackoverflow in my comment:
#1307 (comment)
and
https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html

But Clang does not support the aligned attribute. I don't know how to force Clang to guarantee unaligned reads.

@codecov
Copy link

codecov bot commented Jun 29, 2022

Codecov Report

Merging #1309 (52c3511) into develop (2174201) will decrease coverage by 0.02%.
The diff coverage is 91.66%.

@@             Coverage Diff             @@
##           develop    #1309      +/-   ##
===========================================
- Coverage    86.51%   86.48%   -0.03%     
===========================================
  Files          125      126       +1     
  Lines        10670    10671       +1     
  Branches      2664     2670       +6     
===========================================
- Hits          9231     9229       -2     
- Misses        1081     1083       +2     
- Partials       358      359       +1     
Flag Coverage Δ
macos_clang 32.60% <ø> (ø)
macos_gcc 73.79% <36.66%> (-0.09%) ⬇️
ubuntu_clang 85.19% <73.33%> (+0.13%) ⬆️
ubuntu_clang_debug 84.84% <73.33%> (ø)
ubuntu_clang_inflate_allow_invalid_dist 84.91% <73.33%> (ø)
ubuntu_clang_inflate_strict 85.15% <73.33%> (ø)
ubuntu_clang_mmap 85.34% <73.33%> (ø)
ubuntu_clang_pigz 40.08% <50.00%> (ø)
ubuntu_clang_pigz_no_optim 41.49% <71.42%> (ø)
ubuntu_clang_pigz_no_threads 39.67% <50.00%> (ø)
ubuntu_clang_reduced_mem 85.58% <73.33%> (ø)
ubuntu_gcc 75.60% <63.33%> (-0.10%) ⬇️
ubuntu_gcc_aarch64 77.36% <66.66%> (-0.06%) ⬇️
ubuntu_gcc_aarch64_compat_no_opt 75.51% <14.28%> (-0.25%) ⬇️
ubuntu_gcc_aarch64_no_acle 76.08% <12.50%> (-0.19%) ⬇️
ubuntu_gcc_aarch64_no_neon 76.08% <14.28%> (-0.23%) ⬇️
ubuntu_gcc_armhf 77.31% <63.63%> (-0.07%) ⬇️
ubuntu_gcc_armhf_compat_no_opt 75.36% <15.78%> (-0.25%) ⬇️
ubuntu_gcc_armhf_no_acle 77.33% <63.63%> (-0.05%) ⬇️
ubuntu_gcc_armhf_no_neon 77.22% <94.73%> (-0.04%) ⬇️
ubuntu_gcc_armsf 77.13% <92.85%> (ø)
ubuntu_gcc_armsf_compat_no_opt 75.26% <0.00%> (-0.02%) ⬇️
ubuntu_gcc_benchmark 74.10% <75.00%> (-0.03%) ⬇️
ubuntu_gcc_compat_no_opt 76.91% <90.47%> (-0.03%) ⬇️
ubuntu_gcc_compat_sprefix 74.03% <75.00%> (-0.03%) ⬇️
ubuntu_gcc_m32 73.55% <75.00%> (-0.03%) ⬇️
ubuntu_gcc_mingw_i686 0.00% <0.00%> (ø)
ubuntu_gcc_mingw_x86_64 0.00% <0.00%> (ø)
ubuntu_gcc_no_avx2 74.74% <11.11%> (-0.19%) ⬇️
ubuntu_gcc_no_ctz 74.89% <87.50%> (-0.07%) ⬇️
ubuntu_gcc_no_ctzll 74.91% <87.50%> (-0.07%) ⬇️
ubuntu_gcc_no_pclmulqdq 73.92% <10.00%> (-0.23%) ⬇️
ubuntu_gcc_no_sse2 74.89% <12.50%> (-0.19%) ⬇️
ubuntu_gcc_no_sse4 74.55% <10.00%> (-0.22%) ⬇️
ubuntu_gcc_o1 74.40% <0.00%> (-0.05%) ⬇️
ubuntu_gcc_osb ∅ <ø> (∅)
ubuntu_gcc_pigz 37.95% <50.00%> (-0.07%) ⬇️
ubuntu_gcc_pigz_aarch64 38.97% <50.00%> (-0.24%) ⬇️
ubuntu_gcc_ppc 73.65% <83.33%> (ø)
ubuntu_gcc_ppc64 74.52% <66.66%> (ø)
ubuntu_gcc_ppc64le 74.46% <75.00%> (-0.05%) ⬇️
ubuntu_gcc_ppc_no_power8 74.58% <83.33%> (ø)
ubuntu_gcc_s390x 74.91% <83.33%> (ø)
ubuntu_gcc_s390x_dfltcc 72.15% <66.66%> (ø)
ubuntu_gcc_s390x_dfltcc_compat 73.69% <0.00%> (ø)
ubuntu_gcc_s390x_no_crc32 74.70% <83.33%> (ø)
ubuntu_gcc_sparc64 74.81% <83.33%> (ø)
ubuntu_gcc_sprefix 73.69% <75.00%> (-0.03%) ⬇️
win64_gcc 74.06% <69.56%> (+0.03%) ⬆️
win64_gcc_compat_no_opt 74.75% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
arch/power/chunkset_power8.c 100.00% <ø> (ø)
arch/x86/chunkset_sse2.c 28.57% <0.00%> (ø)
insert_string_tpl.h 100.00% <ø> (ø)
arch/arm/chunkset_neon.c 100.00% <100.00%> (ø)
arch/x86/chunkset_avx.c 100.00% <100.00%> (ø)
arch/x86/chunkset_sse41.c 100.00% <100.00%> (ø)
chunkset.c 100.00% <100.00%> (ø)
compare256.c 100.00% <100.00%> (ø)
deflate.h 100.00% <100.00%> (ø)
inffast.c 90.28% <100.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2174201...52c3511. Read the comment docs.

@nmoinvaz
Copy link
Member Author

Rebased.

@shuffle2
Copy link
Contributor

shuffle2 commented Jul 23, 2022

Compare without memcmp (see: eq_uint64_t_tmp): https://godbolt.org/z/TavGsbTh9

GCC, Clang and MSVC does not call memcpy. But I don't know if eq_uint64_t_tmp function will inlining.

On ARM and GCC < 9 there will be calls to memcmp.

Hi, I just want to point out that you've typo'd the msvc arguments there. You meant /O2 and put /02. Also, /O2 implies /Og /Oi /Ot /Oy /Ob2 /GF /Gy. So /Oi is not needed. /Ob3 would add more aggressive inlining but isn't needed in this case.

So in the end, msvc is generating code nearly same as gcc/clang here: https://godbolt.org/z/9MaYYsTbK
msvc/arm64 also emits the memcmp

Copy link
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit e22195e into zlib-ng:develop Aug 17, 2022
@Dead2 Dead2 mentioned this pull request Dec 27, 2022
Dead2 added a commit that referenced this pull request Mar 7, 2023
Changes since 2.0.6:
- Fix CVE-2022-37434 #1328
- Fix chunkmemset #1196
- Fix deflateBound too small #1236
- Fix Z_SOLO #1263
- Fix ACLE variant of crc32 #1274
- Fix inflateBack #1311
- Fix deflate_quick windowsize #1431
- Fix DFLTCC bugs related to adler32 #1349 and #1390
- Fix warnings #1194 #1312 #1362
- MacOS build fix #1198
- Add invalid windowBits handling #1293
- Support for Force TZCNT #1186
- Support for aligned_alloc() #1360
- Minideflate improvements #1175 #1238
- Dont use unaligned access for memcpy #1309
- Build system #1209 #1233 #1267 #1273 #1278 #1292 #1316 #1318 #1365
- Test improvements #1208 #1227 #1241 #1353
- Cleanup #1266
- Documentation #1205 #1359
- Misc improvements #1294 #1297 #1306 #1344 #1348
- Backported zlib fixes
- Backported CI workflows from Develop branch
Dead2 added a commit that referenced this pull request Mar 17, 2023
Changes since 2.0.6:
- Fix CVE-2022-37434 #1328
- Fix chunkmemset #1196
- Fix deflateBound too small #1236
- Fix Z_SOLO #1263
- Fix ACLE variant of crc32 #1274
- Fix inflateBack #1311
- Fix deflate_quick windowsize #1431
- Fix DFLTCC bugs related to adler32 #1349 and #1390
- Fix warnings #1194 #1312 #1362
- MacOS build fix #1198
- Add invalid windowBits handling #1293
- Support for Force TZCNT #1186
- Support for aligned_alloc() #1360
- Minideflate improvements #1175 #1238
- Dont use unaligned access for memcpy #1309
- Build system #1209 #1233 #1267 #1273 #1278 #1292 #1316 #1318 #1365
- Test improvements #1208 #1227 #1241 #1353
- Cleanup #1266
- Documentation #1205 #1359
- Misc improvements #1294 #1297 #1306 #1344 #1348
- Backported zlib fixes
- Backported CI workflows from Develop branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants