Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move initialization of functable to init_functable function #1405

Merged
merged 4 commits into from
Feb 3, 2023

Conversation

pps83
Copy link
Contributor

@pps83 pps83 commented Jan 13, 2023

This PR also fixes an issue where feature detection wasn't done for each of the stub functions (fixes #1172).

@pps83
Copy link
Contributor Author

pps83 commented Jan 13, 2023

Note, I didn't try to merge checks, instead, the goal was to make the change less invasive. That is, it was done in a semi-automatic way:

image

same goes for all other stub functions

@pps83
Copy link
Contributor Author

pps83 commented Jan 13, 2023

Quite likely, this change may finally allow removing all that Z_TLS cruft.

@phprus
Copy link
Contributor

phprus commented Jan 13, 2023

Atomic assignment is not guaranteed for pointers.

But, I do not know architectures where it would be dangerous in this functable case.

@pps83
Copy link
Contributor Author

pps83 commented Jan 13, 2023

Atomic assignment is not guaranteed for pointers.

perhaps on x64 (not even sure), but afaik for 32-bit it's guaranteed. In any case, zlib-ng stands out among all other libs with such dynamic dispatch: I haven't seen a single one that uses tls for this usecase.

@phprus
Copy link
Contributor

phprus commented Jan 13, 2023

Most CPU - guaranteed.
C standard - not guaranteed.

@codecov
Copy link

codecov bot commented Jan 13, 2023

Codecov Report

Base: 83.17% // Head: 82.99% // Decreases project coverage by -0.19% ⚠️

Coverage data is based on head (fb02678) compared to base (750a7a0).
Patch coverage: 79.11% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1405      +/-   ##
===========================================
- Coverage    83.17%   82.99%   -0.19%     
===========================================
  Files          130      130              
  Lines        10788    10803      +15     
  Branches      2794     2794              
===========================================
- Hits          8973     8966       -7     
- Misses        1115     1136      +21     
- Partials       700      701       +1     
Flag Coverage Δ
macos_clang 42.97% <ø> (ø)
macos_gcc 72.87% <60.00%> (-0.31%) ⬇️
ubuntu_clang 82.21% <61.60%> (-0.59%) ⬇️
ubuntu_clang_debug 81.93% <68.61%> (-0.51%) ⬇️
ubuntu_clang_inflate_allow_invalid_dist 81.87% <61.60%> (-0.59%) ⬇️
ubuntu_clang_inflate_strict 82.20% <61.60%> (-0.59%) ⬇️
ubuntu_clang_mmap 82.32% <59.82%> (-0.61%) ⬇️
ubuntu_clang_pigz 13.69% <35.55%> (+0.02%) ⬆️
ubuntu_clang_pigz_no_optim 11.38% <58.58%> (+0.44%) ⬆️
ubuntu_clang_pigz_no_threads 13.51% <30.59%> (-0.08%) ⬇️
ubuntu_clang_reduced_mem 82.70% <61.60%> (-0.60%) ⬇️
ubuntu_gcc 74.33% <69.62%> (-0.32%) ⬇️
ubuntu_gcc_aarch64 76.61% <65.76%> (-0.36%) ⬇️
ubuntu_gcc_aarch64_compat_no_opt 75.13% <61.22%> (-0.35%) ⬇️
ubuntu_gcc_aarch64_no_acle 75.63% <64.48%> (-0.37%) ⬇️
ubuntu_gcc_aarch64_no_neon 75.64% <62.74%> (-0.36%) ⬇️
ubuntu_gcc_armhf 76.68% <65.76%> (-0.36%) ⬇️
ubuntu_gcc_armhf_compat_no_opt 75.09% <61.22%> (-0.36%) ⬇️
ubuntu_gcc_armhf_no_acle 76.66% <64.48%> (-0.38%) ⬇️
ubuntu_gcc_armhf_no_neon 76.58% <62.74%> (-0.37%) ⬇️
ubuntu_gcc_armsf 74.04% <51.28%> (-0.56%) ⬇️
ubuntu_gcc_armsf_compat_no_opt 73.58% <51.28%> (-0.53%) ⬇️
ubuntu_gcc_benchmark 72.97% <50.63%> (-0.59%) ⬇️
ubuntu_gcc_compat_no_opt 76.12% <61.22%> (-0.37%) ⬇️
ubuntu_gcc_compat_sprefix 73.01% <50.63%> (-0.42%) ⬇️
ubuntu_gcc_m32 72.42% <50.63%> (-0.62%) ⬇️
ubuntu_gcc_no_avx2 73.75% <65.32%> (-0.24%) ⬇️
ubuntu_gcc_no_ctz 74.13% <51.28%> (-0.56%) ⬇️
ubuntu_gcc_no_ctzll 74.15% <51.28%> (-0.56%) ⬇️
ubuntu_gcc_no_pclmulqdq 73.54% <68.46%> (-0.33%) ⬇️
ubuntu_gcc_no_sse2 73.82% <66.93%> (-0.32%) ⬇️
ubuntu_gcc_no_sse4 73.48% <69.62%> (-0.31%) ⬇️
ubuntu_gcc_o1 73.33% <68.14%> (-0.32%) ⬇️
ubuntu_gcc_osb ∅ <ø> (∅)
ubuntu_gcc_pigz 38.25% <47.50%> (-0.09%) ⬇️
ubuntu_gcc_pigz_aarch64 39.46% <48.10%> (-0.14%) ⬇️
ubuntu_gcc_ppc 73.27% <51.28%> (-0.43%) ⬇️
ubuntu_gcc_ppc64 73.86% <51.28%> (-0.53%) ⬇️
ubuntu_gcc_ppc64le 73.95% <51.28%> (-0.54%) ⬇️
ubuntu_gcc_ppc_no_power8 74.06% <51.28%> (-0.52%) ⬇️
ubuntu_gcc_s390x 74.23% <51.28%> (-0.56%) ⬇️
ubuntu_gcc_s390x_dfltcc 71.44% <51.28%> (-0.37%) ⬇️
ubuntu_gcc_s390x_dfltcc_compat 73.37% <61.61%> (-0.23%) ⬇️
ubuntu_gcc_s390x_no_crc32 74.02% <51.28%> (-0.56%) ⬇️
ubuntu_gcc_sparc64 74.26% <51.28%> (-0.57%) ⬇️
ubuntu_gcc_sprefix 72.49% <50.63%> (-0.60%) ⬇️
win64_gcc_compat_no_opt 74.30% <51.28%> (-0.51%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
functable.c 69.81% <79.11%> (-7.96%) ⬇️
arch/x86/adler32_avx512_tpl.h 97.87% <0.00%> (-2.13%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pps83 pps83 force-pushed the develop-init_functable branch 2 times, most recently from 21e4a74 to 73d6139 Compare January 17, 2023 00:27
@Dead2
Copy link
Member

Dead2 commented Jan 19, 2023

I think this looks like a good idea, and I give my tentative thumbs up 👍.

I never liked the TLS kludge, and as a bonus this might even avoid the Stabilizer crash on TLS code (That is a bug in Stabilizer, but avoiding it would be nice).

I still want to actually test this and have a closer look before approving this for merging, and I'd also like others to review this change since it is relatively complex and a tiny error can cause us to incorrectly use a sub-optimal function for example (as we have had happen before).

@mtl1979
Copy link
Collaborator

mtl1979 commented Jan 19, 2023

The diff is a mess, but it looks like only moving initialization of the functable members from each stub to a new function, which will cause slight slowdown, but that is sub-second range. It really doesn't help readability of the file in my opinion.

functable.c Outdated
#include "cpu_features.h"

Z_INTERNAL Z_TLS struct functable_s functable;
static void init_functable(void) {
uint32_t (* adler32) (uint32_t adler, const uint8_t *buf, uint64_t len);
Copy link
Member

@nmoinvaz nmoinvaz Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use an intermediate struct functable_s instead of all these variables? Then at the end of the function copy one struct to another.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy one struct to another is not atomic on all CPUs.
Copy one pointer to another is atomic on most CPUs.

But I think it's better to use an intermediate struct functable_s instead of separate variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy one struct to another is not atomic on all CPUs.
Copy one pointer to another is atomic on most CPUs.

It should have a comment where it copies each one individually telling why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, functable_s is a better idea. Copying whole table vs pointer by pointer wouldn't make a difference as long as individual pointers are copied atomically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying whole table vs pointer by pointer wouldn't make a difference as long as individual pointers are copied atomically.

No. There are no guarantees how struct copying will be implemented.
The compiler can use rep movsb (x86) and then the copy will be byte by byte, not pointer to pointer (by pointer-size copy instruction).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slowdown of the first call will be less than the total slowdown of all functions before RP, since the compiler will most likely combine the same if.

I think any reasonably programmer would understand that it's pointless to discuss performance in this case. It just make no sense at all, it distracts from stuff that matters. The code had bugs, and mystical perf is second to correctness, especially when it's crystal clear that this thing has no perf overhead. It's not like it called once per 100-byte block. Even if it was called once per compres/uncompress sequence it's still would be a non-issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been programmer for few months short of 40 years.... And I see your argument is still overly simplified... I'm not saying this PR is utter waste of time, I'm just saying it can be further improved before we approve it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pps83
I apologize :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtl1979

Some could argue that thread-safety of CPU feature checks should indeed be separate PR, but reviewed before instead of after this PR.

I don't understand you.

If we support CPU with different functions on different cores we need TLS.
Then the current implementation is broken since cpu_check_features does not use TLS. TLS is not free, so it should be used less.
Because of this, I think it's best to have one point of cpu_check_features call and select for all versioned features.

If we don't we not need TLS.
In this case we need atomic initialization of pointers in functable. And it will also be easier to do it in one place.

So I guess changing cpu_check_features will be easier after this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write the current implementation of CPU feature checking... It was written gradually a long time before we even had function table. Like I said elsewhere in this PR, I already know the current implementation is buggy, but instead of introducing more bugs or regressions, we should fix the current ones first...

I've been developer for long enough time to notice when people try to mix opinions with facts. This will only complicate reviewing pull requests as none of us know everything about every part of the code, so when people try to confuse us, we need to research to make sure what information we have is actually factual and not just result of trolling.

A lot of time I might sound like a real bitch or arrogant, but that is just to make sure every pull request gets reviewed properly so we don't need to revert or rewrite them later wasting valuable time. @nmoinvaz and I have been talking about improving the CI to catch even more issues, but those improvements have not been merged in yet.

functable.c Outdated Show resolved Hide resolved
@nmoinvaz
Copy link
Member

I agree, that readability some what suffers.

#include "cpu_features.h"

Z_INTERNAL Z_TLS struct functable_s functable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did global struct functable_s functable go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was always at the end of the file. I find it absolutely strange that this code compiles in c, in c++ that's a compilation error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redefining something to same type is silently ignored as forward declaring in C. This allows resolving possible circular dependency.

@nmoinvaz nmoinvaz added the cleanup Improving maintainability or removing code. label Jan 19, 2023
functable.c Fixed Show fixed Hide fixed
@pps83
Copy link
Contributor Author

pps83 commented Jan 22, 2023

The diff is a mess, but it looks like only moving initialization of the functable members from each stub to a new function,

Yes, the code for initialization of functable members was simply moved to standalone function.

which will cause slight slowdown, but that is sub-second range.

There is no slowdown. Not sure what else to say about it, I guess it's quite clear.

It really doesn't help readability of the file in my opinion.

The PR is made so that the change is easier to review. Follow up change to init_functable might be a good idea so that it follows not the messy way of function by function init, but instead has different blocks for cpu features. Similar to this code: https://github.com/mirror/x264/blob/master/common/x86/mc-c.c#L796

IMO, from the start function pointer init code should have been in a separate function, not spread out over different stub functions.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jan 22, 2023

IMO, from the start function pointer init code should have been in a separate function, not spread out over different stub functions.

There is almost as many opinions as there is contributors to zlib-ng... Sometimes we have to make compromises to get things merged in. There was a reason why each stub only altered own member in the function table as the initial implementation didn't use TLS and many programs did use zlib from more than one thread, so there was quite high chance that one thread could overwrite or corrupt data written by another thread.

@pps83
Copy link
Contributor Author

pps83 commented Jan 22, 2023

IMO, from the start function pointer init code should have been in a separate function, not spread out over different stub functions.

There is almost as many opinions as there is contributors to zlib-ng... Sometimes we have to make compromises to get things merged in. There was a reason why each stub only altered own member in the function table as the initial implementation didn't use TLS and many programs did use zlib from more than one thread, so there was quite high chance that one thread could overwrite or corrupt data written by another thread.

I wanted to edit my reply to explain why, but cancelled the edit. The idea is that initialization of function pointers based on testing cpu features is vastly different concept than actual calling by function pointers. It’s usually better to keep conceptually different stuff in different functions. Because all this was mixed current zlib-ng ended up to have hidden bugs: some of these stub functions do not test cpu features. This PR not only does some clean up, it also fixes this bug.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jan 22, 2023

I wanted to edit my reply to explain why, but cancelled the edit. The idea is that initialization of function pointers based on testing cpu features is vastly different concept than actual calling by function pointers. It’s usually better to keep conceptually different stuff in different functions. Because all this was mixed current zlib-ng ended up to have hidden bugs: some of these stub functions do not test cpu features. This PR not only does some clean up, it also fixes this bug.

Mixing cleanup and bug fixes is generally discouraged. Some stub functions are not supposed to be first function table member to be called, they are always called after another member function is called first, so they don't actually need to initialize the feature flags based on cpuid or similar kernel feature.

@phprus
Copy link
Contributor

phprus commented Jan 22, 2023

Some stub functions are not supposed to be first function table member to be called, they are always called after another member function is called first, so they don't actually need to initialize the feature flags based on cpuid or similar kernel feature.

This approach complicates the code and adds non-obvious dependencies.
Merge initialization code in single function solves this problem without overhead.

@pps83
Copy link
Contributor Author

pps83 commented Jan 22, 2023

Mixing cleanup and bug fixes is generally discouraged.

I won't go into details why I made the change, but while doing it I noticed that zlib-ng had the bug. There is also a PR that fixes it. My change fixes it as well, since there only one init function now.

@mtl1979
Copy link
Collaborator

mtl1979 commented Jan 22, 2023

This approach complicates the code and adds non-obvious dependencies. Merge initialization code in single function solves this problem without overhead.

This part of the code has been neglected for quite a while as people have worked on more "important" bugs in the codebase. I'm just trying to make sure the fix in this PR is the best possible and that means listening to comments by as many people as possible before making the final choice of approving or rejecting.

Like I've said in most controversial pull requests, at least two of the long-time contributors must approve the PR before it can be merged. Since I started contributing, I've mostly reviewed pull requests that involve Windows (and other non-Linux operating systems, including FreeBSD, OpenBSD and NetBSD) or non-x86 processors, as my knowledge of modern x86 processors and their instruction sets is quite limited.

functable.c Show resolved Hide resolved
`functable` is already declared by functable.h which is included by functable.c
@pps83
Copy link
Contributor Author

pps83 commented Jan 31, 2023

@Dead2 ^^^

@Dead2 Dead2 merged commit d144fc0 into zlib-ng:develop Feb 3, 2023
@pps83 pps83 deleted the develop-init_functable branch February 3, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Improving maintainability or removing code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants