-
-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move initialization of functable to init_functable function #1405
Conversation
Quite likely, this change may finally allow removing all that |
Atomic assignment is not guaranteed for pointers. But, I do not know architectures where it would be dangerous in this |
perhaps on x64 (not even sure), but afaik for 32-bit it's guaranteed. In any case, zlib-ng stands out among all other libs with such dynamic dispatch: I haven't seen a single one that uses tls for this usecase. |
fd9cfcc
to
1b21e59
Compare
Most CPU - guaranteed. |
Codecov ReportBase: 83.17% // Head: 82.99% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #1405 +/- ##
===========================================
- Coverage 83.17% 82.99% -0.19%
===========================================
Files 130 130
Lines 10788 10803 +15
Branches 2794 2794
===========================================
- Hits 8973 8966 -7
- Misses 1115 1136 +21
- Partials 700 701 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
21e4a74
to
73d6139
Compare
I think this looks like a good idea, and I give my tentative thumbs up 👍. I never liked the TLS kludge, and as a bonus this might even avoid the Stabilizer crash on TLS code (That is a bug in Stabilizer, but avoiding it would be nice). I still want to actually test this and have a closer look before approving this for merging, and I'd also like others to review this change since it is relatively complex and a tiny error can cause us to incorrectly use a sub-optimal function for example (as we have had happen before). |
The diff is a mess, but it looks like only moving initialization of the functable members from each stub to a new function, which will cause slight slowdown, but that is sub-second range. It really doesn't help readability of the file in my opinion. |
functable.c
Outdated
#include "cpu_features.h" | ||
|
||
Z_INTERNAL Z_TLS struct functable_s functable; | ||
static void init_functable(void) { | ||
uint32_t (* adler32) (uint32_t adler, const uint8_t *buf, uint64_t len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use an intermediate struct functable_s
instead of all these variables? Then at the end of the function copy one struct to another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy one struct to another is not atomic on all CPUs.
Copy one pointer to another is atomic on most CPUs.
But I think it's better to use an intermediate struct functable_s
instead of separate variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy one struct to another is not atomic on all CPUs.
Copy one pointer to another is atomic on most CPUs.
It should have a comment where it copies each one individually telling why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, functable_s is a better idea. Copying whole table vs pointer by pointer wouldn't make a difference as long as individual pointers are copied atomically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying whole table vs pointer by pointer wouldn't make a difference as long as individual pointers are copied atomically.
No. There are no guarantees how struct copying will be implemented.
The compiler can use rep movsb
(x86) and then the copy will be byte by byte, not pointer to pointer (by pointer-size copy instruction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slowdown of the first call will be less than the total slowdown of all functions before RP, since the compiler will most likely combine the same if.
I think any reasonably programmer would understand that it's pointless to discuss performance in this case. It just make no sense at all, it distracts from stuff that matters. The code had bugs, and mystical perf is second to correctness, especially when it's crystal clear that this thing has no perf overhead. It's not like it called once per 100-byte block. Even if it was called once per compres/uncompress sequence it's still would be a non-issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been programmer for few months short of 40 years.... And I see your argument is still overly simplified... I'm not saying this PR is utter waste of time, I'm just saying it can be further improved before we approve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pps83
I apologize :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some could argue that thread-safety of CPU feature checks should indeed be separate PR, but reviewed before instead of after this PR.
I don't understand you.
If we support CPU with different functions on different cores we need TLS.
Then the current implementation is broken since cpu_check_features
does not use TLS. TLS is not free, so it should be used less.
Because of this, I think it's best to have one point of cpu_check_features
call and select for all versioned features.
If we don't we not need TLS.
In this case we need atomic initialization of pointers in functable
. And it will also be easier to do it in one place.
So I guess changing cpu_check_features
will be easier after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't write the current implementation of CPU feature checking... It was written gradually a long time before we even had function table. Like I said elsewhere in this PR, I already know the current implementation is buggy, but instead of introducing more bugs or regressions, we should fix the current ones first...
I've been developer for long enough time to notice when people try to mix opinions with facts. This will only complicate reviewing pull requests as none of us know everything about every part of the code, so when people try to confuse us, we need to research to make sure what information we have is actually factual and not just result of trolling.
A lot of time I might sound like a real bitch or arrogant, but that is just to make sure every pull request gets reviewed properly so we don't need to revert or rewrite them later wasting valuable time. @nmoinvaz and I have been talking about improving the CI to catch even more issues, but those improvements have not been merged in yet.
I agree, that readability some what suffers. |
#include "cpu_features.h" | ||
|
||
Z_INTERNAL Z_TLS struct functable_s functable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did global struct functable_s functable
go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was always at the end of the file. I find it absolutely strange that this code compiles in c, in c++ that's a compilation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redefining something to same type is silently ignored as forward declaring in C. This allows resolving possible circular dependency.
73d6139
to
1e5a583
Compare
Yes, the code for initialization of functable members was simply moved to standalone function.
There is no slowdown. Not sure what else to say about it, I guess it's quite clear.
The PR is made so that the change is easier to review. Follow up change to IMO, from the start function pointer init code should have been in a separate function, not spread out over different stub functions. |
There is almost as many opinions as there is contributors to zlib-ng... Sometimes we have to make compromises to get things merged in. There was a reason why each stub only altered own member in the function table as the initial implementation didn't use TLS and many programs did use zlib from more than one thread, so there was quite high chance that one thread could overwrite or corrupt data written by another thread. |
I wanted to edit my reply to explain why, but cancelled the edit. The idea is that initialization of function pointers based on testing cpu features is vastly different concept than actual calling by function pointers. It’s usually better to keep conceptually different stuff in different functions. Because all this was mixed current zlib-ng ended up to have hidden bugs: some of these stub functions do not test cpu features. This PR not only does some clean up, it also fixes this bug. |
Mixing cleanup and bug fixes is generally discouraged. Some stub functions are not supposed to be first function table member to be called, they are always called after another member function is called first, so they don't actually need to initialize the feature flags based on |
This approach complicates the code and adds non-obvious dependencies. |
I won't go into details why I made the change, but while doing it I noticed that zlib-ng had the bug. There is also a PR that fixes it. My change fixes it as well, since there only one init function now. |
This part of the code has been neglected for quite a while as people have worked on more "important" bugs in the codebase. I'm just trying to make sure the fix in this PR is the best possible and that means listening to comments by as many people as possible before making the final choice of approving or rejecting. Like I've said in most controversial pull requests, at least two of the long-time contributors must approve the PR before it can be merged. Since I started contributing, I've mostly reviewed pull requests that involve Windows (and other non-Linux operating systems, including FreeBSD, OpenBSD and NetBSD) or non-x86 processors, as my knowledge of modern x86 processors and their instruction sets is quite limited. |
df23096
to
291e637
Compare
`functable` is already declared by functable.h which is included by functable.c
291e637
to
fb02678
Compare
@Dead2 ^^^ |
This PR also fixes an issue where feature detection wasn't done for each of the stub functions (fixes #1172).