Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronization primitive improvements #3918

Merged
merged 5 commits into from Dec 18, 2019
Merged

Conversation

kprotty
Copy link
Member

@kprotty kprotty commented Dec 16, 2019

SpinLock

  • removed Backoff
  • use better yielding per platform
  • added tryAcquire(): ?Held (and deinit() for Mutex compatability)

ResetEvent

  • simpler interface (no bool returns) & blocking wait cant fail anymore
  • fix test case from relying on os scheduling with time.sleep
  • fix deadlock detection in single threaded & support single threaded test cases

Mutex

  • fix lack of spin_count increment which made Mutex effectively SpinLock
  • updated atomic ops (orderings for correctness, loads & yields for performance)
  • specialized implementation per platform for performance
  • added tryAcquire(): ?Held

@andrewrk
Copy link
Member

Thanks for the PR. What kind of differences did you observe in the example code you were testing with this?

@kprotty
Copy link
Member Author

kprotty commented Dec 16, 2019

@andrewrk These were only the results on my dev machine (ryzen 5 2600) so it may not be too representative. Am also yet to see the effects in actual software and plan to check out the zig-async-demo repo for more testing soon.

for SpinLock:

  • linux and mac osx catalina benefited about 2x improvement for high contended, small critical sections (SCS)
  • windows by 2x as well, but for larger critical sections (LCS)

for Mutex:

  • for linux, stopped eating up cpu when idle, 20-30% behind glibc pthread_mutex SCS, about even in LCS
  • for mac osx catalina, same situation but 2x faster than pthread_mutex in both
  • for windows, about 10-25% faster than SRWLock for both

@kprotty
Copy link
Member Author

kprotty commented Dec 16, 2019

In regards to the differences in zig-async-demo compiled with release-safe, the outcomes were similar for both fact-await and sieve under gnu perf: The new structures took a bit longer to complete (~200-500 ms) under various iteration counts but used 60% less cycles & the hottest path moved from the "is locked" check in std.mutex.acquireSlow() to other places like std.math.big.int.Int.llmullacc and std.event.Channel.put from the mutex not being a spinlock anymore.

@JesseRMeyer
Copy link

The new structures took a bit longer to complete (~200-500 ms)

How much total time did the old structures take to complete?

@andrewrk
Copy link
Member

These test failures look legit

@kprotty
Copy link
Member Author

kprotty commented Dec 16, 2019

@JesseRMeyer should note that the 200-500ms is the difference rather than the absolute time to complete; as that varies more based on machine + the amount of iterations specified in each run case

@JesseRMeyer
Copy link

@JesseRMeyer should note that the 200-500ms is the difference rather than the absolute time to complete; as that varies more based on machine + the amount of iterations specified in each run case

Right. A ratio here would be a bit more meaningful.

@kprotty
Copy link
Member Author

kprotty commented Dec 16, 2019

@JesseRMeyer ah you're right. Its around 0.9 - 1.2% slower under release-safe

@andrewrk
Copy link
Member

@kprotty -

@LemonBoy mentioned in #3932 that it would fix the test failures in this PR. Mind rebasing this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants