`f128` `from_bits` / `to_bits` sometimes gets reversed on ppc #125102

tgross35 · 2024-05-14T03:20:31Z

It seems like using transmute causes the upper and lower 64-bit halves to be loaded reversed. More at https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/f128.20system.20libraries.20noncompliant.20platforms/near/438489330

#![feature(f128)]

#[no_mangle]
#[inline(never)]
fn add_entry(a: f128, b: f128) -> f128 {
    a + b
}

fn main() {
    let a = f128::from_bits(0x0);
    let b = f128::from_bits(0x1);
    dbg!(a, b);
    let c = add_entry(a, b);
    dbg!(c);
}

Espected b to print as 0x00000000000000000000000000000001, but with rustc add_test.rs -o add_test.rust --target powerpc64-unknown-linux-gnu -Clinker=powerpc64-linux-gnu-gcc -Ctarget-cpu=pwr9 it instead prints 0x00000000000000010000000000000000.

The text was updated successfully, but these errors were encountered:

tgross35 · 2024-05-14T03:22:25Z

@rustbot label +F-f16_and_f128 +A-ABI +T-libs +C-bug -needs-triage +A-llvm

RalfJung · 2024-05-14T11:27:05Z

If we define f128 to be the corresponding IEEE float format, then the transmute is by definition correct. That float format defines the representation in memory and I think it's endianess must work out exactly the same as u128? Cc @eddyb @jcranmer @Muon

So at first this looks like a platform bug to me. Certainly we'd have to change a lot more than just from_bits / to_bits if this was deemed correct behavior -- codegen and Miri also need to know how to convert between the raw bits an a floating-point value. Do LLVM optimizations do anything to handle this?

RalfJung · 2024-05-14T11:41:05Z

Are you sure this is not an ABI bug, a mismatch between caller and callee in how the arguments are interpreted? That seems more likely to me than an in-memory float format with mixed endianess.

Does it make any difference if you remove the from_bits call from the equation, and do the transmute inline via a union instead?

#![feature(f128)]

#[no_mangle]
#[inline(never)]
fn add_entry(a: f128, b: f128) -> f128 {
    a + b
}

union U {
    i: u128,
    f: f128,
}

fn main() { unsafe {
    let a = U { i: 0x0 };
    let b = U { i: 0x1 };
    dbg!(a.f, b.f);
    let c = add_entry(a.f, b.f);
    dbg!(c);
} }

tgross35 · 2024-05-14T12:01:30Z

Yeah there is definitely something more going on here, setting opt-level=3 makes the problem go away

tgross35 · 2024-05-14T12:11:08Z

Slightly reduced

$ cat transmute.rs
#![feature(f128)]

fn main() {
    let a = f128::from_bits(0x1);
    dbg!(a);
}
$ rustc transmute.rs --target powerpc64-unknown-linux-gnu -C linker=powerpc64-linux-gnu-gcc -Ctarget-cpu=pwr9 -o transmute.rs.ppc64.pwr9
$ qemu-ppc64 -L /usr/powerpc64-linux-gnu/ transmute.rs.ppc64.pwr9                                                                                                                                                                    
[transmute.rs:5:5] a = 0x00000000000000010000000000000000

Any idea how to narrow further?

tgross35 · 2024-05-14T12:16:33Z

Your suggestion does indeed fix the issue:

$ cat transmute_union.rs                                                                                                                                                                                                             24-05-14 08:15:29
#![feature(f128)]

union U {
    i: u128,
    f: f128,
}

fn main() {
    let a = U { i: 0x1 };
    dbg!(unsafe { a.f });
}
$ rustc transmute_union.rs --target powerpc64-unknown-linux-gnu -C linker=powerpc64-linux-gnu-gcc -Ctarget-cpu=pwr9 -o transmute_union.rs.ppc64.pwr9
$ qemu-ppc64 -L /usr/powerpc64-linux-gnu/ transmute_union.rs.ppc64.pwr9
[transmute_union.rs:10:5] unsafe { a.f } = 0x00000000000000000000000000000001

edit: but turn on -O and it’s borked again. Interesting.

RalfJung · 2024-05-14T12:41:40Z

Any idea how to narrow further?

I'd define from_bits locally in this crate and add more debugging to see if it's passing arguments to that function where things break, or returning them from that function.

Muon · 2024-05-14T12:41:45Z

If we define f128 to be the corresponding IEEE float format, then the transmute is by definition correct. That float format defines the representation in memory and I think it's endianess must work out exactly the same as u128?

IEEE 754 does not specify memory layout. It just gives interpretations for 128-bit integers, but nothing prevents hardware from having, say, little-endian integer arithmetic and big-endian floating-point arithmetic. That said, AFAICT, PowerPC is big-endian for everything. More relevantly, however, PowerPC does not have 128-bit floating-point support, so this is being emulated in software anyway.

Muon · 2024-05-14T12:52:50Z

Correction: Power9 does in fact have native IEEE 754 binary128 support, as part of VSX, according to its architecture manual. And indeed it is big endian. Older PowerPC doesn't, but I did see that the compilation command line did request Power9.

RalfJung · 2024-05-14T12:57:00Z

0x00000000000000010000000000000000 would indicate some sort of weird mixed-endian format. So I can't believe that that's the actual intended format of anything, it looks more like a bug (possible more than one bug) in argument passing... and possible optimizer bugs as well, given that with -O even the union-based variant breaks.

In fact maybe it's all an optimizer bug; from_bits is from the stdlib after all, so built with optimizations.

tgross35 · 2024-05-14T18:59:59Z

@ecnelises or @lu-zero any chance you have any ideas here? It is looking like this has to be a bug in LLVM, just don't know exactly what.

Muon · 2024-05-15T00:39:55Z

I saw this message from Qiu Chaofan on Zulip, and thought I'd test whether the memory representation of the smallest positive f128 differs from the representation of 1 as a u128.

I got an ICE due to unimplemented stuff in const eval in Rust, so I had to resort to C (gcc, clang). However, there were no differences in the representation.

This isn't conclusive yet since the two groups of 4 bytes in the middle could still be swapped around, so I did another test with no repeats there (gcc, clang). Again, no differences.

It seems that both GCC and Clang assume that u128 is big endian, just like f128, even if the loads and stores in that case are conducted as two lots of 64 bits.

RalfJung · 2024-05-15T05:55:27Z

even if the loads and stores in that case are conducted as two lots of 64 bits.

But that should be an implementation detail, right, and not observable? When the two halves are strung together to form f128, endianess needs to be taken into account properly.

I think the best starting point is figuring out why the optimizer breaks

#![feature(f128)]

union U {
    i: u128,
    f: f128,
}

fn main() {
    let a = U { i: 0x1 };
    dbg!(unsafe { a.f });
}

I see no way that's not an LLVM bug, so it might be worth reporting there (after some further minimization to remove the debug machinery from the equation).

Muon · 2024-05-15T06:39:54Z

even if the loads and stores in that case are conducted as two lots of 64 bits.

But that should be an implementation detail, right, and not observable? When the two halves are strung together to form f128, endianess needs to be taken into account properly.

Yes, that's right. I was just checking whether u128 was stored with the lower 64 bits before the higher 64 bits. That would mess it up and look mixed endian. But that's not the case. That is, u128 is stored big endian, so it's definitely an LLVM bug.

tgross35 · 2024-05-15T09:03:21Z

Something must be quite wonky on the LLVM side here, accidentally tried to compile the same thing for 32-bit ppc with the pwr9 target-cpu and it gives me a SIGILL. Not sure if pwr9 is even valid or not but an error message would have been nice :)

$ rustc transmute.rs --target powerpc-unknown-linux-gnu -C linker=powerpc-linux-gnu-gcc -Ctarget-cpu=pwr9 -o transmute.rs.ppc.pwr9
Illegal instruction

For reference:

rustc 1.80.0-nightly (9c9b56879 2024-05-05)
binary: rustc
commit-hash: 9c9b568792ef20d8459c745345dd3e79b7c7fa8c
commit-date: 2024-05-05
host: x86_64-unknown-linux-gnu
release: 1.80.0-nightly
LLVM version: 18.1.4

bt via gdb:

(gdb) file rustc
Reading symbols from rustc...
(gdb) run transmute.rs --target powerpc-unknown-linux-gnu -C linker=powerpc-linux-gnu-gcc -Ctarget-cpu=pwr9 -o transmute.rs.ppc.pwr9
Starting program: /home/tmgross/.cargo/bin/rustc transmute.rs --target powerpc-unknown-linux-gnu -C linker=powerpc-linux-gnu-gcc -Ctarget-cpu=pwr9 -o transmute.rs.ppc.pwr9

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
process 8177 is executing new program: /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe95ff6c0 (LWP 8189)]
[New Thread 0x7fffe8dff6c0 (LWP 8190)]
[New Thread 0x7fffe36a66c0 (LWP 8191)]
[New Thread 0x7fffe2fff6c0 (LWP 8192)]
[New Thread 0x7fffe27ff6c0 (LWP 8193)]
[New Thread 0x7fffe1fff6c0 (LWP 8194)]
[Thread 0x7fffe27ff6c0 (LWP 8193) exited]

Thread 7 "opt cgu.0" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7fffe1fff6c0 (LWP 8194)]
0x00007fffeee2a599 in llvm::PPCTargetLowering::LowerOperation(llvm::SDValue, llvm::SelectionDAG&) const ()
   from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
(gdb) b
Breakpoint 1 at 0x7fffeee2a599
(gdb) bt
#0  0x00007fffeee2a599 in llvm::PPCTargetLowering::LowerOperation(llvm::SDValue, llvm::SelectionDAG&) const ()
   from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#1  0x00007ffff0227d99 in llvm::TargetLowering::LowerOperationWrapper(llvm::SDNode*, llvm::SmallVectorImpl<llvm::SDValue>&, llvm::SelectionDAG&) const ()
   from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#2  0x00007ffff0c68ac0 in llvm::DAGTypeLegalizer::ExpandIntegerOperand(llvm::SDNode*, unsigned int) [clone .cold] ()
   from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#3  0x00007ffff0389b36 in llvm::DAGTypeLegalizer::run() [clone .warm] () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#4  0x00007fffefb71b09 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#5  0x00007fffefbd92c2 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#6  0x00007fffefb167ea in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#7  0x00007fffeede2b1e in (anonymous namespace)::PPCDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) ()
   from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#8  0x00007fffefb0ee14 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#9  0x00007fffefb0e285 in llvm::FPPassManager::runOnModule(llvm::Module&) () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#10 0x00007ffff0074ca0 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/libLLVM.so.18.1-rust-1.80.0-nightly
#11 0x00007ffff6b58d10 in LLVMRustWriteOutputFile () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/librustc_driver-c823c964cb628bbb.so
#12 0x00007ffff6b58928 in rustc_codegen_llvm::back::write::write_output_file () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/librustc_driver-c823c964cb628bbb.so
#13 0x00007ffff6b565af in rustc_codegen_llvm::back::write::codegen () from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/librustc_driver-c823c964cb628bbb.so
#14 0x00007ffff6b56285 in rustc_codegen_ssa::back::write::finish_intra_module_work::<rustc_codegen_llvm::LlvmCodegenBackend> ()
   from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/librustc_driver-c823c964cb628bbb.so
#15 0x00007ffff6b59377 in std::sys_common::backtrace::__rust_begin_short_backtrace::<<rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::ExtraBackendMethods>::spawn_named_thread<rustc_codegen_ssa::back::write::spawn_work<rustc_codegen_llvm::LlvmCodegenBackend>::{closure#0}, ()>::{closure#0}, ()> ()
   from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/librustc_driver-c823c964cb628bbb.so
#16 0x00007ffff6b58e5b in <<std::thread::Builder>::spawn_unchecked_<<rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::ExtraBackendMethods>::spawn_named_thread<rustc_codegen_ssa::back::write::spawn_work<rustc_codegen_llvm::LlvmCodegenBackend>::{closure#0}, ()>::{closure#0}, ()>::{closure#2} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0} ()
   from /home/tmgross/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/librustc_driver-c823c964cb628bbb.so
#17 0x00007ffff19997bb in alloc::boxed::{impl#48}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2022
#18 alloc::boxed::{impl#48}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:2022
#19 std::sys::pal::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/pal/unix/thread.rs:108
#20 0x00007ffff1760b5a in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#21 0x00007ffff17f15fc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Edit: reported this one at llvm/llvm-project#92233

tgross35 · 2024-05-15T11:24:49Z

I at least was able to reduce the original issue enough to remove dbg! but it could probably be reduced further. Reported to LLVM at llvm/llvm-project#92246

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 14, 2024

tgross35 changed the title ~~Transmute for from_bits / to_bits may not work on ppc~~ f128 transmute for from_bits / to_bits may not work on ppc May 14, 2024

tgross35 changed the title ~~f128 transmute for from_bits / to_bits may not work on ppc~~ f128 from_bits / to_bits sometimes gets reversed on ppc May 14, 2024

tgross35 mentioned this issue May 14, 2024

Tracking Issue for f16 and f128 float types #116909

Open

3 tasks

tgross35 mentioned this issue May 15, 2024

[PowerPC] power9 f128 ABI issue: bitcasts can reverse halves llvm/llvm-project#92246

Open

rustbot added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`f128` `from_bits` / `to_bits` sometimes gets reversed on ppc #125102

`f128` `from_bits` / `to_bits` sometimes gets reversed on ppc #125102

tgross35 commented May 14, 2024

tgross35 commented May 14, 2024 •

edited

RalfJung commented May 14, 2024 •

edited

RalfJung commented May 14, 2024 •

edited

tgross35 commented May 14, 2024 •

edited

tgross35 commented May 14, 2024 •

edited

tgross35 commented May 14, 2024 •

edited

RalfJung commented May 14, 2024

Muon commented May 14, 2024

Muon commented May 14, 2024 •

edited

RalfJung commented May 14, 2024 •

edited

tgross35 commented May 14, 2024 •

edited

Muon commented May 15, 2024 •

edited

RalfJung commented May 15, 2024

Muon commented May 15, 2024

tgross35 commented May 15, 2024 •

edited

tgross35 commented May 15, 2024 •

edited

f128 from_bits / to_bits sometimes gets reversed on ppc #125102

f128 from_bits / to_bits sometimes gets reversed on ppc #125102

Comments

tgross35 commented May 14, 2024

tgross35 commented May 14, 2024 • edited

RalfJung commented May 14, 2024 • edited

RalfJung commented May 14, 2024 • edited

tgross35 commented May 14, 2024 • edited

tgross35 commented May 14, 2024 • edited

tgross35 commented May 14, 2024 • edited

RalfJung commented May 14, 2024

Muon commented May 14, 2024

Muon commented May 14, 2024 • edited

RalfJung commented May 14, 2024 • edited

tgross35 commented May 14, 2024 • edited

Muon commented May 15, 2024 • edited

RalfJung commented May 15, 2024

Muon commented May 15, 2024

tgross35 commented May 15, 2024 • edited

tgross35 commented May 15, 2024 • edited

`f128` `from_bits` / `to_bits` sometimes gets reversed on ppc #125102

`f128` `from_bits` / `to_bits` sometimes gets reversed on ppc #125102

tgross35 commented May 14, 2024 •

edited

RalfJung commented May 14, 2024 •

edited

RalfJung commented May 14, 2024 •

edited

tgross35 commented May 14, 2024 •

edited

tgross35 commented May 14, 2024 •

edited

tgross35 commented May 14, 2024 •

edited

Muon commented May 14, 2024 •

edited

RalfJung commented May 14, 2024 •

edited

tgross35 commented May 14, 2024 •

edited

Muon commented May 15, 2024 •

edited

tgross35 commented May 15, 2024 •

edited

tgross35 commented May 15, 2024 •

edited