Skip to content

Round towards Zero#28

Merged
recmo merged 14 commits into
mainfrom
xr/rtz
May 6, 2025
Merged

Round towards Zero#28
recmo merged 14 commits into
mainfrom
xr/rtz

Conversation

@xrvdg
Copy link
Copy Markdown
Collaborator

@xrvdg xrvdg commented May 1, 2025

This PR introduces the crate “rtz” to set the floating-point rounding mode to Round Toward Zero (RTZ). RTZ is necessary to be able to perform integer multiplications in the mantissa of a floating point.

Motivation
Rust/LLVM does not have any built-ins for changing the float point mode rounding modes and
this module provides abstractions to able to control the AArch64 FPCR (Floating-point Control Register) rounding mode in safe way* by employing lifetimes.

*This won’t prevent the Rust compiler from making invalid inferences as described here. For our application safety is 1) thread-safety and 2) ensuring that operations are performed under the right rounding mode. Without lifetimes the compiler will reorder the (re)setting of the rounding mode causing the dependant operation to be run under the wrong rounding mode.

The following PRs built on top of this one:

@xrvdg xrvdg marked this pull request as draft May 1, 2025 02:41
@xrvdg xrvdg requested review from Dzejkop and recmo May 2, 2025 06:17
@xrvdg xrvdg changed the title Round-to-Zero Round towards Zero May 2, 2025
@xrvdg xrvdg self-assigned this May 2, 2025
@xrvdg xrvdg marked this pull request as ready for review May 2, 2025 06:18
Comment thread rtz/src/lib.rs Outdated

thread_local! {
/// Thread-local flag to ensure only one RTZ instance exists per thread.
/// This prevents multiple concurrent modifications to the FPCR register.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is sufficient. Both async and rayon make it such that work can be scheduled inside other tasks.

A simpler API with (maybe?) stronger guarantees is:

with_rounding_mode(mode: Mode, f: FnMut() -> R) -> R;

Where f not being Send or async guarantees most of what we need. But then you still need to be careful what you do inside the closure (e.g. don't use rayon).

Plus, as discussed earlier, Rust itself doesn't support nonstandard rounding modes, leading to UB. So this whole function still needs to be flagged unsafe.

I expect it to be used as

fn skyscraper(a: F, b: F) -> F {
  unsafe {
    with_rounding_mode(Mode::RTZ, || {
       skyscraper_inner(a, b)
    })
  }
}

(and similarly for a batched version to reduce overhead)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would be good to link to the Rust issue in the code documentation. There's basically no safe way to do this, so any user of the API should be well informed.

Copy link
Copy Markdown
Contributor

@recmo recmo May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if skyscraper_inner can only be called with a correctly set rounding mode. We can enforce this with the type system, much like in the current version:

struct ModeProof<const mode: Mode>(PhantomData<Mode>); // Only private constructors.

with_rounding_mode<const mode: Mode>(f: FnMut(&ModeProof<Mode>) -> R) -> R;
fn skyscraper(a: F, b: F) -> F {
  unsafe {
    with_rounding_mode<Mode::RTZ>(|rtz| {
       skyscraper_inner(rtz, a, b)
    })
  }
}

(In fact I think the compiler can infer the mode from the skyscraper_inner signature)

Copy link
Copy Markdown
Collaborator Author

@xrvdg xrvdg May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment there is confusing. The thread_local + Cell is to prevent having two instances of RTZ (in a thread) because if you have two they could get dropped in the wrong order and the rounding mode will be wrong.

RTZ itself is marked as not Send with _no_send: PhantomData<*mut ()>. I went for that because the feature negative_impl isn't stable yet.

I do like the with_rounding_mode<Mode> and have an idea to make it safer and get rid of the thread_local + cell.

Comment thread rtz/src/lib.rs Outdated
/// - The write operation is atomic
#[cfg(target_arch = "aarch64")]
#[inline]
pub fn write(&self, value: u64) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know for sure the compiler won't re-order instructions around this? Is that guaranteed by and asm! block? If not, are the barriers we can add here to force the compiler?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asm! block provides no guarantee it's the lifetime that does it. It ensures that the drop of RTZ, which calls write, will be after the last use of RTZ. It's modelled after how lifetimes of pointers around FFI are tracked. There you don't want to have your free suddenly be in front of the call using the pointer.

I'll look for official documentation or code in std to substantiate this design.

Copy link
Copy Markdown
Collaborator Author

@xrvdg xrvdg May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm exploring other options for this. It turns out that the lifetime by itself is not sufficient under certain inlining conditions. In a variant without Drop I've been able to let the compiler place the restore of FPCR before the dependant computation.

@xrvdg xrvdg marked this pull request as draft May 5, 2025 12:19
@xrvdg xrvdg marked this pull request as ready for review May 6, 2025 10:14
@recmo recmo merged commit ca4a5ec into main May 6, 2025
0 of 2 checks passed
dcbuild3r pushed a commit that referenced this pull request May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants