-
-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer HAVE_ALIGNED_ALLOC when available in zng_alloc #1635
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1635 +/- ##
===========================================
+ Coverage 83.03% 83.11% +0.08%
===========================================
Files 133 133
Lines 10896 10895 -1
Branches 2812 2811 -1
===========================================
+ Hits 9047 9055 +8
+ Misses 1147 1146 -1
+ Partials 702 694 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
1470bc2
to
dab5095
Compare
Added some more helpful comments for people who come across this code.
dab5095
to
32f8791
Compare
Weird OSS-Fuzz CI error, maybe it needs to be re-run. |
Well, that is annoying.. But it also kind of makes sense, at least when it comes to vector intrinsics. I guess that bug(?) has been hiding there all along. |
b3bfb70
to
7201270
Compare
alloc_aligned when using in C++ requires C++17 standard. zutil_p.h include removed from test_crc32 since it was causing the same issue and was not really needed.
7201270
to
6c3a30e
Compare
#ifdef HAVE_POSIX_MEMALIGN | ||
#ifdef HAVE_ALIGNED_ALLOC | ||
/* Size must be a multiple of alignment */ | ||
size = (size + (64 - 1)) & ~(64 - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether perhaps this should rather be an assert, and each alloc should instead be individually padded to the nearest 64-bytes.
This ensures we always have a 64-byte aligned end to avoid hiding bugs, and a predictable allocation size. Except for macos (maybe we should start using _aligned_alloc()
and _aligned_free()
there).
The only problem with either solution is that we end up allocing more memory than before, that will cause warnings in Nginx and require it to be patched, but no other application that we know of hard-codes what we can alloc.
I am wondering whether we should consider moving to always making just a single alloc, and then distribute the needed buffers from that. That would avoid Nginx or any other application needing to optimize away our alloc() syscalls, while also letting us more easily ensure that we always have the needed padding and alignment no matter what platform we are on. This also likely reduces memory usage slightly, because each buffer we delegate starts and ends on a 64-byte boundary, thus packing the allocations together more tightly. Optionally we could also make sure we only 64-align the buffers that need it, letting the remainder be 16-aligned.
If we go this route, we should perhaps go there directly, to avoid multiple changes that affect external applications like Nginx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether perhaps this should rather be an assert, and each alloc should instead be individually padded to the nearest 64-bytes.
I think that moves the complexity to other parts of the code, where as right here it is nicely contained. However, in that function, I wouldn't mind moving that particular line of code to affect all malloc
functions.
I am wondering whether we should consider moving to always making just a single alloc, and then distribute the needed buffers from that.
Sounds similar to what Linux kernel needs, where it is all located on the stack. But I think the deflate/inflate
structs have to be allocated differently than the window
, because the window
can be resized.
I'm unfamiliar with what Nginx needs. If you want to make a new PR to test some of these ideas out, we can wait on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nginx makes a single allocation for zlib/zlib-ng (different total allocation size) and then returns parts of that for each alloc call zlib/zlib-ng makes. If zlib/zlib-ng allocates too much, it will log a warning and make another real alloc to avoid failing.
The idea is that it makes nginx spend less cpu-time doing alloc calls, leaving more time for real work, and possibly it is also somewhat beneficial to keep the allocations close wrt cache.
How much that actually helps IDK, but each alloc call has some cost, and that cost is probably even higher now that there are so many cpu mitigations in play. And assuming that is true, then that is something that would be beneficial to other applications as well for reducing zlib init latency (web browsers for example).
Hmm, makes me wonder whether we should have a some kind of google benchmark for deflate-init and inflate-init. That would be useful to make sure we have some way to make sure we don't have regressions in how fast init is completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using asserts unnecessarily in production code is generally frowned upon as assertion failure cause termination of the program. It's very common to allocate a buffer that isn't multiple of 64 bytes, so padding is in my professional opinion better solution. We shouldn't forget that we already pad allocations to make sure optimised instructions can read the very last byte of "real" data without resorting to single byte access.
What comes to nginx, it does the allocation already in incompatible, unsupported, way, and as such we shouldn't care if we break it with any change. Until nginx patches are merged upstream, there is no justification for not just making patches for new nginx versions whenever something changes in zlib-ng.
Will pull this after #1654, since that also changes alloc sizes. |
Added some more helpful comments for people who come across this code. I think people would otherwise skip to
APPLE
and forget that different versions of macOS have other aligned memory functions.