Description
While discussing about #109290 with @huixie90 and Apple folks, we noticed that we didn't use the underlying platform's API (e.g. __ulock_wait
on Apple, futex
on Linux) even in cases where we could. This is caused by the brittle way in which we distinguish between the platform-native contention state and other states (e.g. here):
_LIBCPP_EXPORTED_FROM_ABI void __cxx_atomic_notify_one(__cxx_atomic_contention_t const volatile* __location) noexcept; // fast
_LIBCPP_EXPORTED_FROM_ABI void __cxx_atomic_notify_one(void const volatile* __location) noexcept; // slow
We basically forward only atomics which are exactly of the right type to the fast path, and everything else gets the slow path. For example, on Apple __cxx_atomic_contention_t
is int64_t
. That means that if we try to wait on a std::atomic<uint64_t>
, we'll end up in the slow path even though we could clearly use the underlying platform's wait mechanism, which is faster.
To resolve this, I would suggest re-thinking the ABI boundary of the synchronization library and basically forwarding (from the headers) any type that has a natural alignment and sizeof(T) == {32,64}
to the underlying platform API. At the end of the day, we basically hand over a bag of bytes to the underlying platform wait, the value and the type of these bytes doesn't matter.