Prefer HAVE_ALIGNED_ALLOC when available in zng_alloc #1635

nmoinvaz · 2024-01-04T22:33:38Z

Added some more helpful comments for people who come across this code. I think people would otherwise skip to APPLE and forget that different versions of macOS have other aligned memory functions.

codecov · 2024-01-04T22:40:56Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (fd5b20f) 83.03% compared to head (6c3a30e) 83.11%.
Report is 2 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1635      +/-   ##
===========================================
+ Coverage    83.03%   83.11%   +0.08%     
===========================================
  Files          133      133              
  Lines        10896    10895       -1     
  Branches      2812     2811       -1     
===========================================
+ Hits          9047     9055       +8     
+ Misses        1147     1146       -1     
+ Partials       702      694       -8

Flag	Coverage Δ
macos_clang	`42.25% <0.00%> (-0.73%)`	⬇️
macos_gcc	`74.51% <87.50%> (ø)`
ubuntu_clang	`81.91% <50.00%> (-0.01%)`	⬇️
ubuntu_clang_debug	`81.60% <50.00%> (+0.02%)`	⬆️
ubuntu_clang_inflate_allow_invalid_dist	`81.57% <50.00%> (-0.01%)`	⬇️
ubuntu_clang_inflate_strict	`81.90% <50.00%> (-0.01%)`	⬇️
ubuntu_clang_mmap	`82.23% <50.00%> (-0.01%)`	⬇️
ubuntu_clang_pigz	`13.78% <50.00%> (+0.05%)`	⬆️
ubuntu_clang_pigz_no_optim	`11.42% <50.00%> (+0.06%)`	⬆️
ubuntu_clang_pigz_no_threads	`13.69% <50.00%> (+0.05%)`	⬆️
ubuntu_clang_reduced_mem	`82.31% <50.00%> (-0.01%)`	⬇️
ubuntu_clang_toolchain_riscv	`∅ <ø> (∅)`
ubuntu_gcc	`75.03% <37.50%> (-0.05%)`	⬇️
ubuntu_gcc_aarch64	`77.20% <37.50%> (-0.06%)`	⬇️
ubuntu_gcc_aarch64_compat_no_opt	`75.50% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_aarch64_no_acle	`76.03% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_aarch64_no_neon	`76.03% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_armhf	`76.99% <37.50%> (-0.06%)`	⬇️
ubuntu_gcc_armhf_compat_no_opt	`75.47% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_armhf_no_acle	`76.91% <37.50%> (-0.06%)`	⬇️
ubuntu_gcc_armhf_no_neon	`77.06% <37.50%> (-0.06%)`	⬇️
ubuntu_gcc_armsf	`74.46% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_armsf_compat_no_opt	`73.92% <87.50%> (+0.03%)`	⬆️
ubuntu_gcc_benchmark	`73.36% <87.50%> (+0.14%)`	⬆️
ubuntu_gcc_compat_no_opt	`76.62% <37.50%> (-0.06%)`	⬇️
ubuntu_gcc_compat_sprefix	`73.56% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_m32	`73.23% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_mingw_i686	`73.48% <100.00%> (+0.01%)`	⬆️
ubuntu_gcc_mingw_x86_64	`73.49% <100.00%> (+0.01%)`	⬆️
ubuntu_gcc_mips	`74.79% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_mips64	`74.80% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_native_inst	`74.64% <87.50%> (?)`
ubuntu_gcc_no_avx2	`74.17% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_no_ctz	`74.46% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_no_ctzll	`74.45% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_no_pclmulqdq	`74.09% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_no_sse2	`74.36% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_no_sse42	`74.04% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_o1	`73.99% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_osb	`∅ <ø> (∅)`
ubuntu_gcc_pigz	`37.84% <25.00%> (-0.07%)`	⬇️
ubuntu_gcc_pigz_aarch64	`38.86% <25.00%> (+0.06%)`	⬆️
ubuntu_gcc_ppc	`73.75% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_ppc64	`74.18% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_ppc64_power9	`74.36% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_ppc64le	`74.25% <87.50%> (?)`
ubuntu_gcc_ppc64le_novsx	`74.57% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_ppc64le_power9	`74.13% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_ppc_no_power8	`74.47% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_s390x	`74.61% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_s390x_dfltcc	`71.73% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_s390x_dfltcc_compat	`73.85% <87.50%> (+0.02%)`	⬆️
ubuntu_gcc_s390x_no_crc32	`74.40% <87.50%> (+0.01%)`	⬆️
ubuntu_gcc_sparc64	`74.60% <87.50%> (?)`
ubuntu_gcc_sprefix	`73.23% <87.50%> (+0.01%)`	⬆️
win64_gcc	`73.86% <100.00%> (?)`
win64_gcc_compat_no_opt	`74.52% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zutil_p.h

Added some more helpful comments for people who come across this code.

nmoinvaz · 2024-01-06T00:06:22Z

Weird OSS-Fuzz CI error, maybe it needs to be re-run.

Dead2 · 2024-01-06T11:15:51Z

AddressSanitizer: invalid alignment requested in aligned_alloc: 64, alignment must be a power of two and the requested size 0xe8 must be a multiple of alignment

Well, that is annoying.. But it also kind of makes sense, at least when it comes to vector intrinsics.

I guess that bug(?) has been hiding there all along.

alloc_aligned when using in C++ requires C++17 standard. zutil_p.h include removed from test_crc32 since it was causing the same issue and was not really needed.

Dead2 · 2024-01-07T08:53:38Z

zutil_p.h

-#ifdef HAVE_POSIX_MEMALIGN
+#ifdef HAVE_ALIGNED_ALLOC
+    /* Size must be a multiple of alignment */
+    size = (size + (64 - 1)) & ~(64 - 1);


I am wondering whether perhaps this should rather be an assert, and each alloc should instead be individually padded to the nearest 64-bytes.
This ensures we always have a 64-byte aligned end to avoid hiding bugs, and a predictable allocation size. Except for macos (maybe we should start using _aligned_alloc() and _aligned_free() there).

The only problem with either solution is that we end up allocing more memory than before, that will cause warnings in Nginx and require it to be patched, but no other application that we know of hard-codes what we can alloc.

I am wondering whether we should consider moving to always making just a single alloc, and then distribute the needed buffers from that. That would avoid Nginx or any other application needing to optimize away our alloc() syscalls, while also letting us more easily ensure that we always have the needed padding and alignment no matter what platform we are on. This also likely reduces memory usage slightly, because each buffer we delegate starts and ends on a 64-byte boundary, thus packing the allocations together more tightly. Optionally we could also make sure we only 64-align the buffers that need it, letting the remainder be 16-aligned.
If we go this route, we should perhaps go there directly, to avoid multiple changes that affect external applications like Nginx.

I am wondering whether perhaps this should rather be an assert, and each alloc should instead be individually padded to the nearest 64-bytes.

I think that moves the complexity to other parts of the code, where as right here it is nicely contained. However, in that function, I wouldn't mind moving that particular line of code to affect all malloc functions.

I am wondering whether we should consider moving to always making just a single alloc, and then distribute the needed buffers from that.

Sounds similar to what Linux kernel needs, where it is all located on the stack. But I think the deflate/inflate structs have to be allocated differently than the window, because the window can be resized.

I'm unfamiliar with what Nginx needs. If you want to make a new PR to test some of these ideas out, we can wait on this PR.

Nginx makes a single allocation for zlib/zlib-ng (different total allocation size) and then returns parts of that for each alloc call zlib/zlib-ng makes. If zlib/zlib-ng allocates too much, it will log a warning and make another real alloc to avoid failing.
The idea is that it makes nginx spend less cpu-time doing alloc calls, leaving more time for real work, and possibly it is also somewhat beneficial to keep the allocations close wrt cache.

How much that actually helps IDK, but each alloc call has some cost, and that cost is probably even higher now that there are so many cpu mitigations in play. And assuming that is true, then that is something that would be beneficial to other applications as well for reducing zlib init latency (web browsers for example).

Hmm, makes me wonder whether we should have a some kind of google benchmark for deflate-init and inflate-init. That would be useful to make sure we have some way to make sure we don't have regressions in how fast init is completed.

Using asserts unnecessarily in production code is generally frowned upon as assertion failure cause termination of the program. It's very common to allocate a buffer that isn't multiple of 64 bytes, so padding is in my professional opinion better solution. We shouldn't forget that we already pad allocations to make sure optimised instructions can read the very last byte of "real" data without resorting to single byte access.

What comes to nginx, it does the allocation already in incompatible, unsupported, way, and as such we shouldn't care if we break it with any change. Until nginx patches are merged upstream, there is no justification for not just making patches for new nginx versions whenever something changes in zlib-ng.

Dead2 · 2024-02-02T13:05:59Z

Will pull this after #1654, since that also changes alloc sizes.

nmoinvaz added the Build Env label Jan 4, 2024

mtl1979 reviewed Jan 4, 2024

View reviewed changes

zutil_p.h Show resolved Hide resolved

nmoinvaz force-pushed the fixes/prefer-aligned-alloc branch from 1470bc2 to dab5095 Compare January 5, 2024 00:02

Prefer HAVE_ALIGNED_ALLOC when available in zng_alloc

32f8791

Added some more helpful comments for people who come across this code.

nmoinvaz force-pushed the fixes/prefer-aligned-alloc branch from dab5095 to 32f8791 Compare January 6, 2024 00:00

Memory alloc size must be a multiple of alignment for aligned_alloc.

71ff374

nmoinvaz force-pushed the fixes/prefer-aligned-alloc branch 5 times, most recently from b3bfb70 to 7201270 Compare January 7, 2024 02:25

Use zng_alloc_aligned in unit tests to prevent having to use C++17.

6c3a30e

alloc_aligned when using in C++ requires C++17 standard. zutil_p.h include removed from test_crc32 since it was causing the same issue and was not really needed.

nmoinvaz force-pushed the fixes/prefer-aligned-alloc branch from 7201270 to 6c3a30e Compare January 7, 2024 02:27

Dead2 reviewed Jan 7, 2024

View reviewed changes

Dead2 approved these changes Feb 6, 2024

View reviewed changes

Dead2 merged commit 9d33c81 into zlib-ng:develop Feb 7, 2024
137 checks passed

Dead2 mentioned this pull request May 30, 2024

Prepare 2.2.0 release #1741

Closed

Dead2 mentioned this pull request Jun 12, 2024

2.2.0 Release #1747

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefer HAVE_ALIGNED_ALLOC when available in zng_alloc #1635

Prefer HAVE_ALIGNED_ALLOC when available in zng_alloc #1635

nmoinvaz commented Jan 4, 2024 •

edited

Loading

codecov bot commented Jan 4, 2024 •

edited

Loading

nmoinvaz commented Jan 6, 2024

Dead2 commented Jan 6, 2024

Dead2 Jan 7, 2024

nmoinvaz Jan 7, 2024

Dead2 Jan 7, 2024

mtl1979 Jan 12, 2024

Dead2 commented Feb 2, 2024

Prefer HAVE_ALIGNED_ALLOC when available in zng_alloc #1635

Prefer HAVE_ALIGNED_ALLOC when available in zng_alloc #1635

Conversation

nmoinvaz commented Jan 4, 2024 • edited Loading

codecov bot commented Jan 4, 2024 • edited Loading

Codecov Report

nmoinvaz commented Jan 6, 2024

Dead2 commented Jan 6, 2024

Dead2 Jan 7, 2024

Choose a reason for hiding this comment

nmoinvaz Jan 7, 2024

Choose a reason for hiding this comment

Dead2 Jan 7, 2024

Choose a reason for hiding this comment

mtl1979 Jan 12, 2024

Choose a reason for hiding this comment

Dead2 commented Feb 2, 2024

nmoinvaz commented Jan 4, 2024 •

edited

Loading

codecov bot commented Jan 4, 2024 •

edited

Loading