Conversation
|
|
||
| thread_local! { | ||
| /// Thread-local flag to ensure only one RTZ instance exists per thread. | ||
| /// This prevents multiple concurrent modifications to the FPCR register. |
There was a problem hiding this comment.
I don't think this is sufficient. Both async and rayon make it such that work can be scheduled inside other tasks.
A simpler API with (maybe?) stronger guarantees is:
with_rounding_mode(mode: Mode, f: FnMut() -> R) -> R;Where f not being Send or async guarantees most of what we need. But then you still need to be careful what you do inside the closure (e.g. don't use rayon).
Plus, as discussed earlier, Rust itself doesn't support nonstandard rounding modes, leading to UB. So this whole function still needs to be flagged unsafe.
I expect it to be used as
fn skyscraper(a: F, b: F) -> F {
unsafe {
with_rounding_mode(Mode::RTZ, || {
skyscraper_inner(a, b)
})
}
}(and similarly for a batched version to reduce overhead)
There was a problem hiding this comment.
Also would be good to link to the Rust issue in the code documentation. There's basically no safe way to do this, so any user of the API should be well informed.
There was a problem hiding this comment.
It would be nice if skyscraper_inner can only be called with a correctly set rounding mode. We can enforce this with the type system, much like in the current version:
struct ModeProof<const mode: Mode>(PhantomData<Mode>); // Only private constructors.
with_rounding_mode<const mode: Mode>(f: FnMut(&ModeProof<Mode>) -> R) -> R;fn skyscraper(a: F, b: F) -> F {
unsafe {
with_rounding_mode<Mode::RTZ>(|rtz| {
skyscraper_inner(rtz, a, b)
})
}
}(In fact I think the compiler can infer the mode from the skyscraper_inner signature)
There was a problem hiding this comment.
My comment there is confusing. The thread_local + Cell is to prevent having two instances of RTZ (in a thread) because if you have two they could get dropped in the wrong order and the rounding mode will be wrong.
RTZ itself is marked as not Send with _no_send: PhantomData<*mut ()>. I went for that because the feature negative_impl isn't stable yet.
I do like the with_rounding_mode<Mode> and have an idea to make it safer and get rid of the thread_local + cell.
| /// - The write operation is atomic | ||
| #[cfg(target_arch = "aarch64")] | ||
| #[inline] | ||
| pub fn write(&self, value: u64) { |
There was a problem hiding this comment.
How do we know for sure the compiler won't re-order instructions around this? Is that guaranteed by and asm! block? If not, are the barriers we can add here to force the compiler?
There was a problem hiding this comment.
The asm! block provides no guarantee it's the lifetime that does it. It ensures that the drop of RTZ, which calls write, will be after the last use of RTZ. It's modelled after how lifetimes of pointers around FFI are tracked. There you don't want to have your free suddenly be in front of the call using the pointer.
I'll look for official documentation or code in std to substantiate this design.
There was a problem hiding this comment.
I'm exploring other options for this. It turns out that the lifetime by itself is not sufficient under certain inlining conditions. In a variant without Drop I've been able to let the compiler place the restore of FPCR before the dependant computation.
This PR introduces the crate “rtz” to set the floating-point rounding mode to Round Toward Zero (RTZ). RTZ is necessary to be able to perform integer multiplications in the mantissa of a floating point.
Motivation
Rust/LLVM does not have any built-ins for changing the float point mode rounding modes and
this module provides abstractions to able to control the AArch64 FPCR (Floating-point Control Register) rounding mode in safe way* by employing lifetimes.
*This won’t prevent the Rust compiler from making invalid inferences as described here. For our application safety is 1) thread-safety and 2) ensuring that operations are performed under the right rounding mode. Without lifetimes the compiler will reorder the (re)setting of the rounding mode causing the dependant operation to be run under the wrong rounding mode.
The following PRs built on top of this one: