New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More struct and deflate-related cleanups and optimizations #740
Conversation
I noticed that you removed the |
Baseline 6264b5a
PR #740 254c3e7
About 0.4% faster on average. |
Yes, that is actually what made most of the speed increase. |
Side note: I wonder if the compiler can better optimize if returning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me other than comments made by @mtl1979 that should be fixed.
Codecov Report
@@ Coverage Diff @@
## develop #740 +/- ##
===========================================
+ Coverage 77.38% 77.43% +0.04%
===========================================
Files 69 69
Lines 7614 7644 +30
Branches 1312 1322 +10
===========================================
+ Hits 5892 5919 +27
+ Misses 1211 1206 -5
- Partials 511 519 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@nmoinvaz Normally return value is in register, so if you return a variable, compiler can optimize the code, so same register is used across the whole function. This implies that the function doesn't call other functions. I don't think loading constant to register is much slower as compiler can zero- or sign extend constants more easily than variables. In most cases the compiler also knows it doesn't need to spill the previous contents of EAX/RAX register. |
@mtl1979 Ah ok so it probably doesn't really matter. |
@nmoinvaz Unless there is a lot of other memory loads close to the return of constant, optimizer most likely can mask the difference by reordering so that the load happens without a pipeline stall. There is obviously limit of how many memory loads can happen in one block of code (it depends on processor architecture, but safe is less than 5), so this doesn't apply to using constants in any other part of code. |
Right, at least if the macros don't have them themselves. =/ |
ERR_RETURN(strm, Z_BUF_ERROR); | ||
} | ||
|
||
/* User must not provide more input after the first FINISH: */ | ||
if (s->status == FINISH_STATE && strm->avail_in != 0) { | ||
if (s->status == FINISH_STATE && strm->avail_in != 0) | ||
ERR_RETURN(strm, Z_BUF_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtl1979 You meant this one, I guess.
That macro was already missing {} around it at line 787, but I'll add it to both just for consistency.
@Dead2 In most cases macro itself shouldn't have {} around the actual code, but a lot of people use dummy do or while block to avoid mysterious errors when including the macro in places where only a single code line is expected. Only when the macro declares new variables, it requires explicit {}. Some older compilers did actually have maximum limit for nested block scopes, for example old Linaro ARM compiler. |
Rebased to fix merge conflicts with #735 Ran another benchmark too
Actually slightly better than the previous two tests, but still within test-to-test variance. |
This PR is a little all over the place, since I was hunting warnings all over, and it represents about 2 full days of testing and tweaking.
I did have to give up on shrinking s->heap, and also on getting rid of all of the bi_valid warnings, after numerous attemtpts from scratch, all resulting in incorrect deflate output data for unknown reasons (I have theories, but don't know exactly where the errors happen).
I don't know how many warnings got fixed and/or silenced, but I'd estimate 20 (when excluding duplicates), still lots more (and quite a lot of them false due to a GCC bug, makes it hard to spot the real ones).
This shrinks the deflate internal_state struct by 8 bytes. Before:
After:
PS: To review this, please look at each commit separately, otherwise it'll be very difficult to get an overview of what is happening.