Miscompilation in release mode for f32 on 64-bits platforms #13830

gwenzek · 2022-12-08T21:06:00Z

Zig Version

0.10.0+4be1bb4aa

Steps to Reproduce and Observed Behavior

const S = extern struct { v1: f32 };

pub export fn assert_S(lv: S) u8 {
    var err: u8 = 0;
    if (lv.v1 != 2.5) err = 1;
    return err;
}

On debug mode this compiles and runs fine

With -DReleaseFast this compiles on x86_64-linux to:

assert_S:
        mov     al, 1
        ret

The same issue also appears for aarch64, I haven't tested with other platforms.

The generated LLVM IR looks correct to me, but I'd like a sanity check before opening a LLVM issue.

; ModuleID = 'f'
source_filename = "f"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%f.S = type { float }

@builtin.zig_backend = internal unnamed_addr constant i64 2, align 8
@builtin.output_mode = internal unnamed_addr constant i2 -2, align 1

; Function Attrs: nounwind
define dso_local zeroext i8 @assert_S(double %0) #0 {
Entry:
  %1 = alloca i8, align 1
  %2 = alloca %f.S, align 4
  %3 = getelementptr inbounds { double }, ptr %2, i32 0, i32 0
  store double %0, ptr %3, align 8
  store i8 0, ptr %1, align 1
  %4 = getelementptr inbounds %f.S, ptr %2, i32 0, i32 0
  %5 = load float, ptr %4, align 4
  %6 = fcmp une float %5, 2.500000e+00
  br i1 %6, label %Then, label %Block

Then:                                             ; preds = %Entry
  store i8 1, ptr %1, align 1
  br label %Block

Block:                                            ; preds = %Entry, %Then
  %7 = load i8, ptr %1, align 1
  ret i8 %7
}

; Function Attrs: mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

attributes #0 = { nounwind "frame-pointer"="none" "target-cpu"="x86-64" "target-features"="-16bit-mode,-32bit-mode,-3dnow,-3dnowa,+64bit,+adx,+aes,-amx-bf16,-amx-int8,-amx-tile,+avx,+avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512fp16,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vp2intersect,-avx512vpopcntdq,-avxvnni,+bmi,+bmi2,-branchfusion,-cldemote,+clflushopt,+clwb,-clzero,+cmov,-crc32,+cx16,+cx8,-enqcmd,-ermsb,+f16c,-false-deps-getmant,-false-deps-lzcnt-tzcnt,-false-deps-mulc,-false-deps-mullq,-false-deps-perm,-false-deps-popcnt,-false-deps-range,-fast-11bytenop,-fast-15bytenop,-fast-7bytenop,-fast-bextr,-fast-gather,-fast-hops,-fast-lzcnt,-fast-movbe,-fast-scalar-fsqrt,-fast-scalar-shift-masks,-fast-shld-rotate,-fast-variable-crosslane-shuffle,-fast-variable-perlane-shuffle,-fast-vector-fsqrt,-fast-vector-shift-masks,+fma,-fma4,+fsgsbase,-fsrm,+fxsr,+gfni,-harden-sls-ijmp,-harden-sls-ret,-hreset,-idivl-to-divb,+idivq-to-divl,+invpcid,-kl,-lea-sp,-lea-uses-ag,-lvi-cfi,-lvi-load-hardening,-lwp,+lzcnt,+macrofusion,+mmx,+movbe,+movdir64b,+movdiri,-mwaitx,+nopl,-pad-short-functions,+pclmul,-pconfig,+pku,+popcnt,-prefer-128-bit,-prefer-256-bit,-prefer-mask-registers,-prefetchwt1,+prfchw,+ptwrite,+rdpid,-rdpru,+rdrnd,+rdseed,-retpoline,-retpoline-external-thunk,-retpoline-indirect-branches,-retpoline-indirect-calls,-rtm,+sahf,-sbb-dep-breaking,-serialize,-seses,-sgx,+sha,+shstk,+slow-3ops-lea,+slow-incdec,-slow-lea,-slow-pmaddwd,-slow-pmulld,-slow-shld,-slow-two-mem-ops,-slow-unaligned-mem-16,-slow-unaligned-mem-32,-soft-float,+sse,+sse2,+sse3,+sse4.1,+sse4.2,-sse4a,-sse-unaligned-mem,+ssse3,-tagged-globals,-tbm,-tsxldtrk,-uintr,-use-glm-div-sqrt-costs,-use-slm-arith-costs,+vaes,+vpclmulqdq,+vzeroupper,+waitpkg,-wbnoinvd,-widekl,+x87,-xop,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn }

The LLVM IR starts to look weird after the "SROA" optimization pass (Scalar Replacement of Aggregates).

~/github/zig-bootstrap/out/build-llvm-host/bin/opt releasefast.zig.5.ll --sroa > releasefast.zig.sroa.bc
 ~/github/zig-bootstrap/out/build-llvm-host/bin/llvm-dis releasefast.zig.sroa.bc

... 
%1 = fcmp une float undef, 2.500000e+00

Note that I haven't been able to reproduce the issue in Clang because Clang lowers float fields to float.
So is Zig generating invalid LLVM IR here ? Should we only use float field in LLVM structs to represents f32 ?

Expected Behavior

The code should be valid both in -ODebug and -OReleaseFast.

The text was updated successfully, but these errors were encountered:

gwenzek · 2022-12-08T21:24:09Z

the thing I don't get is I have different results with llc on my laptop and on godbolt: https://godbolt.org/z/jcrbnMvWz (trunk and 15.0.0)
could it be that Zig is using a bugged version of LLVM (15.0.3) ?

LiterallyVoid · 2022-12-11T08:29:07Z

The LLVM IR code emitted is wrong.

The struct is a stack allocation of type %f.S, which is a structure with a single f32 ({ float }). Then, when populating this structure with the function argument, Zig miscompiles, casts the pointer to a double and writes the function argument there; later, the same structure is read assuming that it contains a float.

gwenzek · 2023-01-02T21:27:34Z

Yes, we are casting between {float} to double, but why is that invalid? If you tests with other kind of struct you'll see that Zig does other casting, and that doesn't seem to be causing any issue.

with {v1: f32, v2: f32} we are casting from {float, float} to double.
with {v1: f32, v2: u8} we are casting from type { float, i8, [3 x i8] } to i64
with {v1: f32, v2: f16} we are casting from type { float, half, [3 x i8] } to double

The pattern I'm seeing here is that we can only cast to something of the same size and alignment, which seems reasonable. If you know a relevant link in LLVM doc I'm interested. I'm going to try to fix this.

AlxHnr · 2023-01-11T20:43:48Z

Float structs break for me only when they are smaller than 16 bytes, see #14262. Adding a 4th float to a struct (or more) mitigates the problem for whatever reason.

Closes ziglang#13830

gwenzek added the bug Observed behavior contradicts documented or intended behavior label Dec 8, 2022

gwenzek mentioned this issue Dec 8, 2022

add -fopt-bisect-limit #13826

Merged

Vexu mentioned this issue Jan 11, 2023

Building with Drelease-fast=true causes extern C structs to contain garbage #14262

Closed

Vexu added this to the 0.11.0 milestone Jan 11, 2023

Vexu added a commit to Vexu/zig that referenced this issue Jan 11, 2023

llvm: correctly handle C ABI structs with f32/f64 alignment differences

a510bc7

Closes ziglang#13830

Vexu added a commit to Vexu/zig that referenced this issue Jan 13, 2023

llvm: correctly handle C ABI structs with f32/f64 alignment differences

d033ebb

Closes ziglang#13830

Vexu added a commit to Vexu/zig that referenced this issue Jan 14, 2023

llvm: correctly handle C ABI structs with f32/f64 alignment differences

736e64f

Closes ziglang#13830

Vexu closed this as completed in 0013042 Jan 14, 2023

Not-Nik mentioned this issue Feb 7, 2023

DrawTextureEx, DrawTexturePro, ect, not drawing Not-Nik/raylib-zig#17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miscompilation in release mode for f32 on 64-bits platforms #13830

Miscompilation in release mode for f32 on 64-bits platforms #13830

gwenzek commented Dec 8, 2022 •

edited

Loading

gwenzek commented Dec 8, 2022

LiterallyVoid commented Dec 11, 2022

gwenzek commented Jan 2, 2023

AlxHnr commented Jan 11, 2023

Miscompilation in release mode for f32 on 64-bits platforms #13830

Miscompilation in release mode for f32 on 64-bits platforms #13830

Comments

gwenzek commented Dec 8, 2022 • edited Loading

Zig Version

Steps to Reproduce and Observed Behavior

Expected Behavior

gwenzek commented Dec 8, 2022

LiterallyVoid commented Dec 11, 2022

gwenzek commented Jan 2, 2023

AlxHnr commented Jan 11, 2023

gwenzek commented Dec 8, 2022 •

edited

Loading