Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer HAVE_ALIGNED_ALLOC when available in zng_alloc #1635

Merged
merged 3 commits into from
Feb 7, 2024

Conversation

nmoinvaz
Copy link
Member

@nmoinvaz nmoinvaz commented Jan 4, 2024

Added some more helpful comments for people who come across this code. I think people would otherwise skip to APPLE and forget that different versions of macOS have other aligned memory functions.

Copy link

codecov bot commented Jan 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (fd5b20f) 83.03% compared to head (6c3a30e) 83.11%.
Report is 2 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1635      +/-   ##
===========================================
+ Coverage    83.03%   83.11%   +0.08%     
===========================================
  Files          133      133              
  Lines        10896    10895       -1     
  Branches      2812     2811       -1     
===========================================
+ Hits          9047     9055       +8     
+ Misses        1147     1146       -1     
+ Partials       702      694       -8     
Flag Coverage Δ
macos_clang 42.25% <0.00%> (-0.73%) ⬇️
macos_gcc 74.51% <87.50%> (ø)
ubuntu_clang 81.91% <50.00%> (-0.01%) ⬇️
ubuntu_clang_debug 81.60% <50.00%> (+0.02%) ⬆️
ubuntu_clang_inflate_allow_invalid_dist 81.57% <50.00%> (-0.01%) ⬇️
ubuntu_clang_inflate_strict 81.90% <50.00%> (-0.01%) ⬇️
ubuntu_clang_mmap 82.23% <50.00%> (-0.01%) ⬇️
ubuntu_clang_pigz 13.78% <50.00%> (+0.05%) ⬆️
ubuntu_clang_pigz_no_optim 11.42% <50.00%> (+0.06%) ⬆️
ubuntu_clang_pigz_no_threads 13.69% <50.00%> (+0.05%) ⬆️
ubuntu_clang_reduced_mem 82.31% <50.00%> (-0.01%) ⬇️
ubuntu_clang_toolchain_riscv ∅ <ø> (∅)
ubuntu_gcc 75.03% <37.50%> (-0.05%) ⬇️
ubuntu_gcc_aarch64 77.20% <37.50%> (-0.06%) ⬇️
ubuntu_gcc_aarch64_compat_no_opt 75.50% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_aarch64_no_acle 76.03% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_aarch64_no_neon 76.03% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_armhf 76.99% <37.50%> (-0.06%) ⬇️
ubuntu_gcc_armhf_compat_no_opt 75.47% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_armhf_no_acle 76.91% <37.50%> (-0.06%) ⬇️
ubuntu_gcc_armhf_no_neon 77.06% <37.50%> (-0.06%) ⬇️
ubuntu_gcc_armsf 74.46% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_armsf_compat_no_opt 73.92% <87.50%> (+0.03%) ⬆️
ubuntu_gcc_benchmark 73.36% <87.50%> (+0.14%) ⬆️
ubuntu_gcc_compat_no_opt 76.62% <37.50%> (-0.06%) ⬇️
ubuntu_gcc_compat_sprefix 73.56% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_m32 73.23% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_mingw_i686 73.48% <100.00%> (+0.01%) ⬆️
ubuntu_gcc_mingw_x86_64 73.49% <100.00%> (+0.01%) ⬆️
ubuntu_gcc_mips 74.79% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_mips64 74.80% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_native_inst 74.64% <87.50%> (?)
ubuntu_gcc_no_avx2 74.17% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_no_ctz 74.46% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_no_ctzll 74.45% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_no_pclmulqdq 74.09% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_no_sse2 74.36% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_no_sse42 74.04% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_o1 73.99% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_osb ∅ <ø> (∅)
ubuntu_gcc_pigz 37.84% <25.00%> (-0.07%) ⬇️
ubuntu_gcc_pigz_aarch64 38.86% <25.00%> (+0.06%) ⬆️
ubuntu_gcc_ppc 73.75% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_ppc64 74.18% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_ppc64_power9 74.36% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_ppc64le 74.25% <87.50%> (?)
ubuntu_gcc_ppc64le_novsx 74.57% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_ppc64le_power9 74.13% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_ppc_no_power8 74.47% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_s390x 74.61% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_s390x_dfltcc 71.73% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_s390x_dfltcc_compat 73.85% <87.50%> (+0.02%) ⬆️
ubuntu_gcc_s390x_no_crc32 74.40% <87.50%> (+0.01%) ⬆️
ubuntu_gcc_sparc64 74.60% <87.50%> (?)
ubuntu_gcc_sprefix 73.23% <87.50%> (+0.01%) ⬆️
win64_gcc 73.86% <100.00%> (?)
win64_gcc_compat_no_opt 74.52% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zutil_p.h Show resolved Hide resolved
Added some more helpful comments for people who come across this code.
@nmoinvaz
Copy link
Member Author

nmoinvaz commented Jan 6, 2024

Weird OSS-Fuzz CI error, maybe it needs to be re-run.

@Dead2
Copy link
Member

Dead2 commented Jan 6, 2024

AddressSanitizer: invalid alignment requested in aligned_alloc: 64, alignment must be a power of two and the requested size 0xe8 must be a multiple of alignment

Well, that is annoying.. But it also kind of makes sense, at least when it comes to vector intrinsics.

I guess that bug(?) has been hiding there all along.

@nmoinvaz nmoinvaz force-pushed the fixes/prefer-aligned-alloc branch 5 times, most recently from b3bfb70 to 7201270 Compare January 7, 2024 02:25
alloc_aligned when using in C++ requires C++17 standard. zutil_p.h
include removed from test_crc32 since it was causing the same issue and was
not really needed.
#ifdef HAVE_POSIX_MEMALIGN
#ifdef HAVE_ALIGNED_ALLOC
/* Size must be a multiple of alignment */
size = (size + (64 - 1)) & ~(64 - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether perhaps this should rather be an assert, and each alloc should instead be individually padded to the nearest 64-bytes.
This ensures we always have a 64-byte aligned end to avoid hiding bugs, and a predictable allocation size. Except for macos (maybe we should start using _aligned_alloc() and _aligned_free() there).

The only problem with either solution is that we end up allocing more memory than before, that will cause warnings in Nginx and require it to be patched, but no other application that we know of hard-codes what we can alloc.

I am wondering whether we should consider moving to always making just a single alloc, and then distribute the needed buffers from that. That would avoid Nginx or any other application needing to optimize away our alloc() syscalls, while also letting us more easily ensure that we always have the needed padding and alignment no matter what platform we are on. This also likely reduces memory usage slightly, because each buffer we delegate starts and ends on a 64-byte boundary, thus packing the allocations together more tightly. Optionally we could also make sure we only 64-align the buffers that need it, letting the remainder be 16-aligned.
If we go this route, we should perhaps go there directly, to avoid multiple changes that affect external applications like Nginx.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether perhaps this should rather be an assert, and each alloc should instead be individually padded to the nearest 64-bytes.

I think that moves the complexity to other parts of the code, where as right here it is nicely contained. However, in that function, I wouldn't mind moving that particular line of code to affect all malloc functions.

I am wondering whether we should consider moving to always making just a single alloc, and then distribute the needed buffers from that.

Sounds similar to what Linux kernel needs, where it is all located on the stack. But I think the deflate/inflate structs have to be allocated differently than the window, because the window can be resized.

I'm unfamiliar with what Nginx needs. If you want to make a new PR to test some of these ideas out, we can wait on this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nginx makes a single allocation for zlib/zlib-ng (different total allocation size) and then returns parts of that for each alloc call zlib/zlib-ng makes. If zlib/zlib-ng allocates too much, it will log a warning and make another real alloc to avoid failing.
The idea is that it makes nginx spend less cpu-time doing alloc calls, leaving more time for real work, and possibly it is also somewhat beneficial to keep the allocations close wrt cache.

How much that actually helps IDK, but each alloc call has some cost, and that cost is probably even higher now that there are so many cpu mitigations in play. And assuming that is true, then that is something that would be beneficial to other applications as well for reducing zlib init latency (web browsers for example).

Hmm, makes me wonder whether we should have a some kind of google benchmark for deflate-init and inflate-init. That would be useful to make sure we have some way to make sure we don't have regressions in how fast init is completed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using asserts unnecessarily in production code is generally frowned upon as assertion failure cause termination of the program. It's very common to allocate a buffer that isn't multiple of 64 bytes, so padding is in my professional opinion better solution. We shouldn't forget that we already pad allocations to make sure optimised instructions can read the very last byte of "real" data without resorting to single byte access.

What comes to nginx, it does the allocation already in incompatible, unsupported, way, and as such we shouldn't care if we break it with any change. Until nginx patches are merged upstream, there is no justification for not just making patches for new nginx versions whenever something changes in zlib-ng.

@Dead2
Copy link
Member

Dead2 commented Feb 2, 2024

Will pull this after #1654, since that also changes alloc sizes.

@Dead2 Dead2 merged commit 9d33c81 into zlib-ng:develop Feb 7, 2024
137 checks passed
@Dead2 Dead2 mentioned this pull request May 30, 2024
@Dead2 Dead2 mentioned this pull request Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants