Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation in release mode for f32 on 64-bits platforms #13830

Closed
gwenzek opened this issue Dec 8, 2022 · 4 comments
Closed

Miscompilation in release mode for f32 on 64-bits platforms #13830

gwenzek opened this issue Dec 8, 2022 · 4 comments
Labels
bug Observed behavior contradicts documented or intended behavior
Milestone

Comments

@gwenzek
Copy link
Contributor

gwenzek commented Dec 8, 2022

Zig Version

0.10.0+4be1bb4aa

Steps to Reproduce and Observed Behavior

https://godbolt.org/z/6cfEPnao3

const S = extern struct { v1: f32 };

pub export fn assert_S(lv: S) u8 {
    var err: u8 = 0;
    if (lv.v1 != 2.5) err = 1;
    return err;
}

On debug mode this compiles and runs fine

With -DReleaseFast this compiles on x86_64-linux to:

assert_S:
        mov     al, 1
        ret

The same issue also appears for aarch64, I haven't tested with other platforms.

The generated LLVM IR looks correct to me, but I'd like a sanity check before opening a LLVM issue.

; ModuleID = 'f'
source_filename = "f"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%f.S = type { float }

@builtin.zig_backend = internal unnamed_addr constant i64 2, align 8
@builtin.output_mode = internal unnamed_addr constant i2 -2, align 1

; Function Attrs: nounwind
define dso_local zeroext i8 @assert_S(double %0) #0 {
Entry:
  %1 = alloca i8, align 1
  %2 = alloca %f.S, align 4
  %3 = getelementptr inbounds { double }, ptr %2, i32 0, i32 0
  store double %0, ptr %3, align 8
  store i8 0, ptr %1, align 1
  %4 = getelementptr inbounds %f.S, ptr %2, i32 0, i32 0
  %5 = load float, ptr %4, align 4
  %6 = fcmp une float %5, 2.500000e+00
  br i1 %6, label %Then, label %Block

Then:                                             ; preds = %Entry
  store i8 1, ptr %1, align 1
  br label %Block

Block:                                            ; preds = %Entry, %Then
  %7 = load i8, ptr %1, align 1
  ret i8 %7
}

; Function Attrs: mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

attributes #0 = { nounwind "frame-pointer"="none" "target-cpu"="x86-64" "target-features"="-16bit-mode,-32bit-mode,-3dnow,-3dnowa,+64bit,+adx,+aes,-amx-bf16,-amx-int8,-amx-tile,+avx,+avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512fp16,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vp2intersect,-avx512vpopcntdq,-avxvnni,+bmi,+bmi2,-branchfusion,-cldemote,+clflushopt,+clwb,-clzero,+cmov,-crc32,+cx16,+cx8,-enqcmd,-ermsb,+f16c,-false-deps-getmant,-false-deps-lzcnt-tzcnt,-false-deps-mulc,-false-deps-mullq,-false-deps-perm,-false-deps-popcnt,-false-deps-range,-fast-11bytenop,-fast-15bytenop,-fast-7bytenop,-fast-bextr,-fast-gather,-fast-hops,-fast-lzcnt,-fast-movbe,-fast-scalar-fsqrt,-fast-scalar-shift-masks,-fast-shld-rotate,-fast-variable-crosslane-shuffle,-fast-variable-perlane-shuffle,-fast-vector-fsqrt,-fast-vector-shift-masks,+fma,-fma4,+fsgsbase,-fsrm,+fxsr,+gfni,-harden-sls-ijmp,-harden-sls-ret,-hreset,-idivl-to-divb,+idivq-to-divl,+invpcid,-kl,-lea-sp,-lea-uses-ag,-lvi-cfi,-lvi-load-hardening,-lwp,+lzcnt,+macrofusion,+mmx,+movbe,+movdir64b,+movdiri,-mwaitx,+nopl,-pad-short-functions,+pclmul,-pconfig,+pku,+popcnt,-prefer-128-bit,-prefer-256-bit,-prefer-mask-registers,-prefetchwt1,+prfchw,+ptwrite,+rdpid,-rdpru,+rdrnd,+rdseed,-retpoline,-retpoline-external-thunk,-retpoline-indirect-branches,-retpoline-indirect-calls,-rtm,+sahf,-sbb-dep-breaking,-serialize,-seses,-sgx,+sha,+shstk,+slow-3ops-lea,+slow-incdec,-slow-lea,-slow-pmaddwd,-slow-pmulld,-slow-shld,-slow-two-mem-ops,-slow-unaligned-mem-16,-slow-unaligned-mem-32,-soft-float,+sse,+sse2,+sse3,+sse4.1,+sse4.2,-sse4a,-sse-unaligned-mem,+ssse3,-tagged-globals,-tbm,-tsxldtrk,-uintr,-use-glm-div-sqrt-costs,-use-slm-arith-costs,+vaes,+vpclmulqdq,+vzeroupper,+waitpkg,-wbnoinvd,-widekl,+x87,-xop,+xsave,+xsavec,+xsaveopt,+xsaves" }
attributes #1 = { mustprogress nocallback nofree nosync nounwind readnone speculatable willreturn }

The LLVM IR starts to look weird after the "SROA" optimization pass (Scalar Replacement of Aggregates).

~/github/zig-bootstrap/out/build-llvm-host/bin/opt releasefast.zig.5.ll --sroa > releasefast.zig.sroa.bc
 ~/github/zig-bootstrap/out/build-llvm-host/bin/llvm-dis releasefast.zig.sroa.bc

... 
%1 = fcmp une float undef, 2.500000e+00

Note that I haven't been able to reproduce the issue in Clang because Clang lowers float fields to float.
So is Zig generating invalid LLVM IR here ? Should we only use float field in LLVM structs to represents f32 ?

Expected Behavior

The code should be valid both in -ODebug and -OReleaseFast.

@gwenzek gwenzek added the bug Observed behavior contradicts documented or intended behavior label Dec 8, 2022
@gwenzek
Copy link
Contributor Author

gwenzek commented Dec 8, 2022

the thing I don't get is I have different results with llc on my laptop and on godbolt: https://godbolt.org/z/jcrbnMvWz (trunk and 15.0.0)
could it be that Zig is using a bugged version of LLVM (15.0.3) ?

@LiterallyVoid
Copy link
Contributor

The LLVM IR code emitted is wrong.

The struct is a stack allocation of type %f.S, which is a structure with a single f32 ({ float }). Then, when populating this structure with the function argument, Zig miscompiles, casts the pointer to a double and writes the function argument there; later, the same structure is read assuming that it contains a float.

@gwenzek
Copy link
Contributor Author

gwenzek commented Jan 2, 2023

Yes, we are casting between {float} to double, but why is that invalid? If you tests with other kind of struct you'll see that Zig does other casting, and that doesn't seem to be causing any issue.

with {v1: f32, v2: f32} we are casting from {float, float} to double.
with {v1: f32, v2: u8} we are casting from type { float, i8, [3 x i8] } to i64
with {v1: f32, v2: f16} we are casting from type { float, half, [3 x i8] } to double

The pattern I'm seeing here is that we can only cast to something of the same size and alignment, which seems reasonable. If you know a relevant link in LLVM doc I'm interested. I'm going to try to fix this.

@AlxHnr
Copy link
Contributor

AlxHnr commented Jan 11, 2023

Float structs break for me only when they are smaller than 16 bytes, see #14262. Adding a 4th float to a struct (or more) mitigates the problem for whatever reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

4 participants