Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine][FuncAttrs] InstCombine breaks initializes attribute #133059

Closed
dtcxzyw opened this issue Mar 26, 2025 · 6 comments · Fixed by #134370
Closed

[InstCombine][FuncAttrs] InstCombine breaks initializes attribute #133059

dtcxzyw opened this issue Mar 26, 2025 · 6 comments · Fixed by #134370
Assignees
Labels

Comments

@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 26, 2025

Reproducer: https://alive2.llvm.org/ce/z/rx6rJk

; bin/opt -passes=instcombine test.ll -S

define fastcc void @src(ptr initializes((0, 48)) %0) {
  %2 = alloca [24 x i8], align 8
  store i8 0, ptr %0, align 16
  %3 = getelementptr i8, ptr %0, i64 1
  store i8 0, ptr %3, align 1
  %4 = getelementptr i8, ptr %0, i64 2
  store i16 0, ptr %4, align 2
  %5 = getelementptr i8, ptr %0, i64 4
  store i32 0, ptr %5, align 4
  %6 = getelementptr i8, ptr %0, i64 8
  store i16 0, ptr %6, align 8
  %7 = getelementptr i8, ptr %0, i64 10
  store i16 0, ptr %7, align 2
  %8 = getelementptr i8, ptr %0, i64 12
  store i32 0, ptr %8, align 4
  %9 = getelementptr i8, ptr %0, i64 16
  store double 0.000000e+00, ptr %9, align 16
  %10 = getelementptr i8, ptr %0, i64 24
  call void @llvm.memcpy.p0.p0.i64(ptr %10, ptr %2, i64 24, i1 false)
  ret void
}

After InstCombine, the last memcpy is dropped as the source buffer is uninitialized (filled with undef (in llvm)/poison (in alive2)).

Should we adjust initializes to only account for writing with well-defined values?
Generated with rustlantis + llubi.
Original rust mir program: https://gist.github.com/dtcxzyw/78250ce921833ea20c5cb040d29f1a3f

cc @haopliu @nikic

@haopliu
Copy link
Contributor

haopliu commented Mar 26, 2025

After InstCombine, the last memcpy is dropped

Would this change trigger another FunctionAttrs run that will update the initializes attribute according to the new IR?

cc @jvoung @aeubanks

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Mar 27, 2025

Would this change trigger another FunctionAttrs run that will update the initializes attribute according to the new IR?

It is impractical. We should make sure that InstCombine performs a valid refinement.

@aeubanks
Copy link
Contributor

aeubanks commented Apr 3, 2025

Reduced: https://alive2.llvm.org/ce/z/ahKwC-

define void @src(ptr initializes((0, 1)) %0) {
  %2 = alloca i8
  call void @llvm.memcpy.p0.p0.i64(ptr %0, ptr %2, i64 1, i1 false)
  ret void
}

Instcombine removes the memcpy. I don't see what's wrong with this. It's ok for instcombine to remove the memcpy since anything is at least as defined as poison. The caller cannot assume that contents of the memory are anything other than poison after @src, but in @src we don't actually have to do the memcpy.

@nunoplopes

@nikic
Copy link
Contributor

nikic commented Apr 3, 2025

@aeubanks The InstCombine transform is (intended to be) valid, the problem here is just with the spec wording for the initializes attribute.

Should we adjust initializes to only account for writing with well-defined values?

No. "Initializing" with uninitialized memory is fine for the purposes of this attribute.

The elegant way to specify the semantics operationally would be analogous to how we specify writable: initializes((Lo, Hi)) writes the bytes (Lo, Hi) with undef at the entry of the function. This justifies both DSE (can remove a write before an undef write) and inference (can insert an undef write before a write), while still allowing this InstCombine transform (can remove an undef write after an undef write).

Unfortunately, it's not as simple as this, because we can't just say the undef write happens at the entry of the function. initializes only wants to guarantee that "initialization" happens at some point on the path to a normal return. It may be omitted on paths that unwind or diverge.

So I think the way to reconcile this would be to essentially keep the current spec, but add that any byte range that is part of initializes but has not been written at the point of return is written with undef then.

@nikic
Copy link
Contributor

nikic commented Apr 3, 2025

See also #133038 for a related issue. If I take these two together, a way to address both would be:

  • If a byte that is part of initializes is stored, mark it as "initialized".
  • If a byte that is part of initializes is loaded, and the byte is not marked as "initialized", return undef. Otherwise perform a normal load.
  • If the function returns, write undef to all bytes that are part of initializes and not marked as "initialized".

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Apr 4, 2025

See also #133038 for a related issue. If I take these two together, a way to address both would be:

  • If a byte that is part of initializes is stored, mark it as "initialized".
  • If a byte that is part of initializes is loaded, and the byte is not marked as "initialized", return undef. Otherwise perform a normal load.
  • If the function returns, write undef to all bytes that are part of initializes and not marked as "initialized".

I am fine with this solution.

@nikic nikic self-assigned this Apr 4, 2025
nikic added a commit to nikic/llvm-project that referenced this issue Apr 4, 2025
Specify the initializes attribute in terms of an "initialized"
shadow state, such that:

 * Loads prior to initialization return undef.
 * Bytes that are not explicitly initialized are written with
   undef on function return.

This is intended to preserve the core semantics of the attribute,
but adjusts the wording in a way that is compatible with existing
optimizations, such as insertion of spurious loads and removal
of uninitialized writes.

Fixes llvm#133038.
Fixes llvm#133059.
@nikic nikic closed this as completed in 03f21e2 Apr 8, 2025
dtcxzyw added a commit to dtcxzyw/llvm-ub-aware-interpreter that referenced this issue Apr 8, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Apr 8, 2025
Specify the initializes attribute in terms of an "initialized" shadow
state, such that:

* Loads prior to initialization return poison.
* Bytes that are not explicitly initialized are written with undef on
function return.

This is intended to preserve the core semantics of the attribute, but
adjusts the wording in a way that is compatible with existing
optimizations, such as insertion of spurious loads and removal of
uninitialized writes.

Fixes llvm/llvm-project#133038.
Fixes llvm/llvm-project#133059.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants