-
-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
functable
without TLS
#1609
functable
without TLS
#1609
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1609 +/- ##
========================================
Coverage 83.02% 83.03%
========================================
Files 133 133
Lines 10895 10896 +1
Branches 2816 2816
========================================
+ Hits 9046 9047 +1
- Misses 1146 1147 +1
+ Partials 703 702 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I think powerpc has 32 bit atomics but not necessarily 64 bit. This is just a vague recollection, though. Hmm, quick bit of googling suggests that 32 bit powerpc didn't have 64 bit atomics, so, maybe not a big deal? Your PR also relies on the implicit 64 bit writes being untearable, rather than something explicit. Maybe we could use explicit atomic integer storage for the pointers? The compiler is likely to convert them to plain old stores on architectures that do have this implicit behavior (at least it does with C++'s std::atomic, I would assume as much for the compiler intrinsics in C). |
https://en.cppreference.com/w/c/atomic/atomic_store Requires C11, but a possible solution. There's also these, though they are more GCC specific. Maybe we could wrap them in something to shim for everything we currently support: Uglier, but, these sorts of things guarantee correct behavior for weaker memory ordering and more importantly, torn stores. We'd only need to wrap the stores, the loads of course don't need any sync semantics of any kind because even if not cache coherent, it's either valid or it's null and if it's null, it can just reassign the pointer to the same address on its own. |
@KungFuJesus C11 atomic require "atomic object types". For this reason I use the builtin functions of gcc and clang. |
For this PR, the platform must provide single guarantee: Atomic store of pointer-size variable. |
functable.c
Outdated
} | ||
|
||
#define STUB_BODY(RET, FUNC_NAME, ...) \ | ||
struct functable_s ft; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be static
? Am I missing something, but wouldn't this evaluate the entire functable
for each function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
A stub may be called more than once only if the first call of versioned functions is parallel. To avoid race we need local variable.
Right, it doesn't much matter on ppc32 if your pointer is 32 bit. The C11 bits are portable, at least to C11 capable compilers, anyway. The atomic_store macro approach here should do the job just fine, but it would involve wrapping it seven ways to sunday for other platforms. MSVC, GCC, and Clang covers a lot, but there's also Solaris Studio, ICC, XL, Open64, etc. Admittedly obscure, but if we could rely on C11 for those, it might be multiple birds with one stone. |
Quick reply: No. If ppc32 guarantees 32-bit write atomicity and ppc64 guarantees 64-bit write atomicity, then PPC support can be added. |
Oh sorry I followed the link and see what you're getting at. We'd need to decorate everything with that atomic keyword, which would implicitly leverage the atomic loads.
I think that it does.
Right, but, with the compiler guarantees, the hardware atomicity would come with it. The compiler should be doing the right thing under the hood (for most ISAs, nothing at all, for others, using whatever atomic store is available with cache aware compare and exchanges / mutexes if needed). The GCC and MSVC paths you have in that macro do, at the cost of only supporting 3 compilers. I agree though that we should be able to avoid the atomic load, the C11 approach might make that difficult. |
GCC-compatible, Clang-compatible and MSVC. This is most of modern compilers. |
No doubt GCCisms are widely supported. I can think of one tiny minor contribution I made for something that didn't: |
I'm also wondering, however difficult, if we could write a test case that tries to induce a torn store to make sure. I have SPARC, PowerPC, and ARM hardware I can test against for it. Unfortunately reliably producing a race with UB is probably impossible to do deterministically. |
CI errors:
|
Yep, been getting that one a lot recently. Trying again usually resolves it. |
Converted to draft. |
17f00a6
to
ac98028
Compare
I did a few quick benchmarks just to make sure there are no surprises. On x86-64 it shows a 0.3% average compression speedup, practically no change for decompression. Since these speedups are marginal and the tests were not extensive, the most interesting part is that they are not slower and that they always seem to err on the side of faster. I'd say that bodes well for this PR.
|
ac98028
to
47d2ba2
Compare
All modern architectures implement atomic assignment for pointer. |
0d08ef0
to
511d131
Compare
Rebased. |
CI error:
Again... |
Unfortunately there is not much we can do about that, except report it to Codecov. |
Re-queued the failed run, all clear now. |
functable.c
Outdated
# define FUNCTABLE_ASSIGN(VAR, FUNC_NAME) \ | ||
_InterlockedExchangePointer((void * volatile *)&(functable.FUNC_NAME), (void *)(VAR->FUNC_NAME)) | ||
#else | ||
(void * volatile)(functable.FUNC_NAME) = (void *)(VAR->FUNC_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a reference that confirms that this is safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't have a public link to the discussion :(
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
511d131
to
ea81350
Compare
New version:
|
It seems ok to me. It looks like it is just trading an implementation that compiler specific for one that is platform specific. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just a note here, zlib-ng already requires minimum C11 from version 2.1.0 and up (it was upped from C99 to C11 because of So perhaps this is all that is really needed and the selection could be simplified? It is nice to have a fallback in any case though, you never know what some compilers support. But possibly this could also be used for MSVC and M_ARM, if they really do support C11 properly. |
To answer my own question; Yes MSVC supports it, but only added that support in 2022. It would be nice if we detected this Anyone have thoughts around this? |
@Dead2 |
Previous discussion in issue #1608.