Skip to content

[LangRef] Update initializes definition #134370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 8, 2025
Merged

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Apr 4, 2025

Specify the initializes attribute in terms of an "initialized" shadow state, such that:

  • Loads prior to initialization return undef.
  • Bytes that are not explicitly initialized are written with undef on function return.

This is intended to preserve the core semantics of the attribute, but adjusts the wording in a way that is compatible with existing optimizations, such as insertion of spurious loads and removal of uninitialized writes.

Fixes #133038. Fixes #133059.

Specify the initializes attribute in terms of an "initialized"
shadow state, such that:

 * Loads prior to initialization return undef.
 * Bytes that are not explicitly initialized are written with
   undef on function return.

This is intended to preserve the core semantics of the attribute,
but adjusts the wording in a way that is compatible with existing
optimizations, such as insertion of spurious loads and removal
of uninitialized writes.

Fixes llvm#133038.
Fixes llvm#133059.
@nikic nikic requested review from aeubanks, dtcxzyw and haopliu April 4, 2025 12:13
@llvmbot llvmbot added the llvm:ir label Apr 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 4, 2025

@llvm/pr-subscribers-llvm-ir

Author: Nikita Popov (nikic)

Changes

Specify the initializes attribute in terms of an "initialized" shadow state, such that:

  • Loads prior to initialization return undef.
  • Bytes that are not explicitly initialized are written with undef on function return.

This is intended to preserve the core semantics of the attribute, but adjusts the wording in a way that is compatible with existing optimizations, such as insertion of spurious loads and removal of uninitialized writes.

Fixes #133038. Fixes #133059.


Full diff: https://github.com/llvm/llvm-project/pull/134370.diff

1 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+18-4)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d242c945816cc..8b423971b94db 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -1690,10 +1690,24 @@ Currently, only the following parameter attributes are defined:
 
 ``initializes((Lo1, Hi1), ...)``
     This attribute indicates that the function initializes the ranges of the
-    pointer parameter's memory, ``[%p+LoN, %p+HiN)``. Initialization of memory
-    means the first memory access is a non-volatile, non-atomic write. The
-    write must happen before the function returns. If the function unwinds,
-    the write may not happen.
+    pointer parameter's memory ``[%p+LoN, %p+HiN)``. Colloquially, this means
+    that all bytes in the specified range are written before the function
+    returns, and not read prior to the initializing write. If the function
+    unwinds, the write may not happen.
+
+    Formally, this is specified in terms of an "initialized" shadow state for
+    all bytes in the range, which is set to "not initialized" at function entry.
+    If a memory access is performed through a pointer based on the argument,
+    and an accessed byte has not been marked as "initialized" yet, then:
+
+     * If the byte is stored with a non-volatile, non-atomic write, mark it as
+       "initialized".
+     * If the byte is stored with a volatile or atomic write, the behavior is
+       undefined.
+     * If the byte is loaded, return an undef value.
+
+    Additionally, if the function returns normally, write an undef value to all
+    bytes that are part of the range and have not been marked as "initialized".
 
     This attribute only holds for the memory accessed via this pointer
     parameter. Other arbitrary accesses to the same memory via other pointers

@dtcxzyw dtcxzyw requested a review from nunoplopes April 4, 2025 12:30
@aeubanks
Copy link
Contributor

aeubanks commented Apr 4, 2025

how does undef vs poison work here?

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 6, 2025

how does undef vs poison work here?

Whether the poison value is allowed in memory is still an open problem: https://discourse.llvm.org/t/rfc-add-nopoison-attribute-metadata/79833.

If we return poison when loading from an "uninitialized" byte, elimination of store with undef value in InstCombine should be treated as miscompilation.

@nikic
Copy link
Contributor Author

nikic commented Apr 6, 2025

Yes, the wording here is for our current semantics where uninitialized memory is undef and memory (de facto if not de jure) cannot contain poison. If we make uninitialized memory in the future, we'd replace undef with poison in this wording as well.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please wait for additional approval from other reviewers.

"initialized".
* If the byte is stored with a volatile or atomic write, the behavior is
undefined.
* If the byte is loaded, return an undef value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be poison, since initializes is only added to function arguments when they are initialized. And reduces our use of undef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning poison is a bit problematic due to load poison widening, but I would be okay with specifying poison here. (As we currently just ignore the whole load poison widening issue everywhere anyway.)

@nunoplopes
Copy link
Member

On higher-level question: is this attribute useful as specified?
The caller only knows that the initialized range was written to, but it can be poison, undef, or a concrete value. There's nothing it can assume about the stored value. All it knows it that the range is dereferenceable, but for that we already have an attribute.
What am I missing?

@nikic
Copy link
Contributor Author

nikic commented Apr 7, 2025

On higher-level question: is this attribute useful as specified? The caller only knows that the initialized range was written to, but it can be poison, undef, or a concrete value. There's nothing it can assume about the stored value. All it knows it that the range is dereferenceable, but for that we already have an attribute. What am I missing?

This attribute is used to DSE stores before the call. We don't care about what the value after the call is, only about dropping stores before the call.

@nunoplopes
Copy link
Member

On higher-level question: is this attribute useful as specified? The caller only knows that the initialized range was written to, but it can be poison, undef, or a concrete value. There's nothing it can assume about the stored value. All it knows it that the range is dereferenceable, but for that we already have an attribute. What am I missing?

This attribute is used to DSE stores before the call. We don't care about what the value after the call is, only about dropping stores before the call.

Ok, thanks, yes, that sounds good.
Writing undef bit if not initialized is a necessary evil because we elide store of undefs :/

@nikic nikic merged commit 03f21e2 into llvm:main Apr 8, 2025
10 checks passed
@nikic nikic deleted the langref-initializes branch April 8, 2025 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[InstCombine][FuncAttrs] InstCombine breaks initializes attribute [LICM] LICM breaks initializes attribute
5 participants