Skip to content

[AMDGPU] Add alignment attr & propagate alignment through make.buffer.rsrc inst #145278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Shoreshen
Copy link
Contributor

@Shoreshen Shoreshen commented Jun 23, 2025

This PR mainly add alignment attribute to AMDGPU backend for make.buffer.rsrc. It will do:

  1. Calculate the alignment from the users of make.buffer.rsrc
  2. Calculate the alignment from the op 0 of make.buffer.rsrc
  3. Propagate through the alignment

For the following example:

define float @align_back_prop(ptr addrspace(1) align 4 %x) {
  %fat.ptr = call ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1) %x, i16 0, i32 256, i32 0)
  %y = load float, ptr addrspace(7) %fat.ptr, align 8
  ret float %y
}

The PR will do:

  1. For the users of make.buffer.rsrc, it requires the alignment of %fat.ptr is 8 (required by load)
  2. For the op 0 of make.buffer.rsrc, it requires the alignment of %x is 4
  3. Propagate the bigger alignment (which is 8 for %fat.ptr) through (to %x), so %x also have the alignment of 8

@llvmbot
Copy link
Member

llvmbot commented Jun 23, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: None (Shoreshen)

Changes
  1. Adding alignment attribute
  2. For make.buffer.rsrc intrinsic, propagate alignment attribute through the returned value and operand 0

Full diff: https://github.com/llvm/llvm-project/pull/145278.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Transforms/IPO/Attributor.h (+22)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+36-1)
  • (modified) llvm/lib/Transforms/IPO/AttributorAttributes.cpp (+12-2)
  • (added) llvm/test/CodeGen/AMDGPU/attr-amdgpu-align.ll (+40)
diff --git a/llvm/include/llvm/Transforms/IPO/Attributor.h b/llvm/include/llvm/Transforms/IPO/Attributor.h
index e6eb756df987d..64285c2114976 100644
--- a/llvm/include/llvm/Transforms/IPO/Attributor.h
+++ b/llvm/include/llvm/Transforms/IPO/Attributor.h
@@ -1355,6 +1355,12 @@ struct InformationCache {
   /// Return the flat address space if the associated target has.
   LLVM_ABI std::optional<unsigned> getFlatAddressSpace() const;
 
+  virtual bool shouldTrackUse(const AbstractAttribute *QueryingAA,
+                              Value &AssociatedValue, const Use *U,
+                              const Instruction *I) const {
+    return false;
+  }
+
 private:
   struct FunctionInfo {
     LLVM_ABI ~FunctionInfo();
@@ -2042,6 +2048,19 @@ struct Attributor {
     SimplificationCallbacks[IRP].emplace_back(CB);
   }
 
+  using AlignmentCallbackTy =
+      std::function<void(const IRPosition &, const AbstractAttribute *,
+                         SmallVectorImpl<AA::ValueAndContext> &)>;
+  void registerAlignmentCallback(const IRPosition &IRP,
+                                 const AlignmentCallbackTy &CB) {
+    AlignmentCallBacks[IRP].emplace_back(CB);
+  }
+
+  SmallVector<AlignmentCallbackTy, 1>
+  getAlignmentCallback(const IRPosition &IRP) {
+    return AlignmentCallBacks.lookup(IRP);
+  }
+
   /// Return true if there is a simplification callback for \p IRP.
   bool hasSimplificationCallback(const IRPosition &IRP) {
     return SimplificationCallbacks.count(IRP);
@@ -2093,6 +2112,9 @@ struct Attributor {
   DenseMap<IRPosition, SmallVector<SimplifictionCallbackTy, 1>>
       SimplificationCallbacks;
 
+  /// The vector with AAAlign callbacks registered by outside AAs.
+  DenseMap<IRPosition, SmallVector<AlignmentCallbackTy, 1>> AlignmentCallBacks;
+
   /// The vector with all simplification callbacks for global variables
   /// registered by outside AAs.
   DenseMap<const GlobalVariable *,
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index f4d3a014f9921..0731dcfbcd05c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -138,6 +138,18 @@ static bool funcRequiresHostcallPtr(const Function &F) {
          F.hasFnAttribute(Attribute::SanitizeMemTag);
 }
 
+static bool isAlignAndMakeBuffer(const AbstractAttribute *AA,
+                                 const Instruction *I) {
+  if (isa<AAAlign>(AA)) {
+    if (const auto *II = dyn_cast<IntrinsicInst>(I)) {
+      if (II->getIntrinsicID() == Intrinsic::amdgcn_make_buffer_rsrc)
+        return true;
+    }
+  }
+
+  return false;
+}
+
 namespace {
 class AMDGPUInformationCache : public InformationCache {
 public:
@@ -235,6 +247,12 @@ class AMDGPUInformationCache : public InformationCache {
     return ST.getMaxWavesPerEU();
   }
 
+  bool shouldTrackUse(const AbstractAttribute *QueryingAA,
+                      Value &AssociatedValue, const Use *U,
+                      const Instruction *I) const override {
+    return isAlignAndMakeBuffer(QueryingAA, I);
+  }
+
 private:
   /// Check if the ConstantExpr \p CE uses an addrspacecast from private or
   /// local to flat. These casts may require the queue pointer.
@@ -1381,7 +1399,7 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
        &AAAMDMaxNumWorkgroups::ID, &AAAMDWavesPerEU::ID, &AAAMDGPUNoAGPR::ID,
        &AACallEdges::ID, &AAPointerInfo::ID, &AAPotentialConstantValues::ID,
        &AAUnderlyingObjects::ID, &AAAddressSpace::ID, &AAIndirectCallInfo::ID,
-       &AAInstanceInfo::ID});
+       &AAInstanceInfo::ID, &AAAlign::ID});
 
   AttributorConfig AC(CGUpdater);
   AC.IsClosedWorldModule = Options.IsClosedWorld;
@@ -1432,6 +1450,23 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
       } else if (auto *CmpX = dyn_cast<AtomicCmpXchgInst>(&I)) {
         A.getOrCreateAAFor<AAAddressSpace>(
             IRPosition::value(*CmpX->getPointerOperand()));
+      } else if (auto *II = dyn_cast<IntrinsicInst>(&I)) {
+        if (II->getIntrinsicID() == Intrinsic::amdgcn_make_buffer_rsrc) {
+          IRPosition IRP = IRPosition::inst(*II);
+
+          Attributor::AlignmentCallbackTy ACB =
+              [](const IRPosition &IRP, const AbstractAttribute *AA,
+                 SmallVectorImpl<AA::ValueAndContext> &Values) {
+                if (auto *I = dyn_cast<Instruction>(&IRP.getAssociatedValue()))
+                  if (isAlignAndMakeBuffer(AA, I)) {
+                    Values.push_back(
+                        AA::ValueAndContext{*I->getOperand(0), nullptr});
+                  }
+              };
+          A.registerAlignmentCallback(IRP, ACB);
+
+          A.getOrCreateAAFor<AAAlign>(IRP);
+        }
       }
     }
   }
diff --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index 3799a696f67af..cca03b30e75c7 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -5202,6 +5202,10 @@ static unsigned getKnownAlignForUse(Attributor &A, AAAlign &QueryingAA,
       TrackUse = true;
     return 0;
   }
+  if (A.getInfoCache().shouldTrackUse(&QueryingAA, AssociatedValue, U, I)) {
+    TrackUse = true;
+    return 0;
+  }
 
   MaybeAlign MA;
   if (const auto *CB = dyn_cast<CallBase>(I)) {
@@ -5369,8 +5373,14 @@ struct AAAlignFloating : AAAlignImpl {
     bool Stripped;
     bool UsedAssumedInformation = false;
     SmallVector<AA::ValueAndContext> Values;
-    if (!A.getAssumedSimplifiedValues(getIRPosition(), *this, Values,
-                                      AA::AnyScope, UsedAssumedInformation)) {
+    const auto &AligmentCBs = A.getAlignmentCallback(getIRPosition());
+    if (!AligmentCBs.empty()) {
+      for (const auto &CB : AligmentCBs) {
+        CB(getIRPosition(), this, Values);
+      }
+    } else if (!A.getAssumedSimplifiedValues(getIRPosition(), *this, Values,
+                                             AA::AnyScope,
+                                             UsedAssumedInformation)) {
       Values.push_back({getAssociatedValue(), getCtxI()});
       Stripped = false;
     } else {
diff --git a/llvm/test/CodeGen/AMDGPU/attr-amdgpu-align.ll b/llvm/test/CodeGen/AMDGPU/attr-amdgpu-align.ll
new file mode 100644
index 0000000000000..85f77735bf2b6
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/attr-amdgpu-align.ll
@@ -0,0 +1,40 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt  -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-attributor %s -o - | FileCheck %s
+
+define float @align_back_prop(ptr addrspace(1) align 4 %x) {
+; CHECK-LABEL: define float @align_back_prop(
+; CHECK-SAME: ptr addrspace(1) align 8 [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[FAT_PTR:%.*]] = call ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1) [[X]], i16 0, i32 256, i32 0)
+; CHECK-NEXT:    [[Y:%.*]] = load float, ptr addrspace(7) [[FAT_PTR]], align 8
+; CHECK-NEXT:    ret float [[Y]]
+;
+  %fat.ptr = call ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1) %x, i16 0, i32 256, i32 0)
+  %y = load float, ptr addrspace(7) %fat.ptr, align 8
+  ret float %y
+}
+
+define float @align_foward_prop(ptr addrspace(1) align 8 %x) {
+; CHECK-LABEL: define float @align_foward_prop(
+; CHECK-SAME: ptr addrspace(1) align 8 [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[FAT_PTR:%.*]] = call ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1) [[X]], i16 0, i32 256, i32 0)
+; CHECK-NEXT:    [[Y:%.*]] = load float, ptr addrspace(7) [[FAT_PTR]], align 8
+; CHECK-NEXT:    ret float [[Y]]
+;
+  %fat.ptr = call ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1) %x, i16 0, i32 256, i32 0)
+  %y = load float, ptr addrspace(7) %fat.ptr, align 4
+  ret float %y
+}
+
+define float @align_mix_prop(ptr addrspace(1) align 4 %x) {
+; CHECK-LABEL: define float @align_mix_prop(
+; CHECK-SAME: ptr addrspace(1) align 8 [[X:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[FAT_PTR:%.*]] = call ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1) [[X]], i16 0, i32 256, i32 0)
+; CHECK-NEXT:    [[Y:%.*]] = load float, ptr addrspace(7) [[FAT_PTR]], align 8
+; CHECK-NEXT:    [[Z:%.*]] = load float, ptr addrspace(1) [[X]], align 8
+; CHECK-NEXT:    ret float [[Z]]
+;
+  %fat.ptr = call ptr addrspace(7) @llvm.amdgcn.make.buffer.rsrc.p7.p1(ptr addrspace(1) %x, i16 0, i32 256, i32 0)
+  %y = load float, ptr addrspace(7) %fat.ptr, align 2
+  %z = load float, ptr addrspace(1) %x, align 8
+  ret float %z
+}

Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable overall though i'll leave final review to folks familiar with the attributor.

One question: how much work would it be to materialize the returned alignment as a attribute on the make.buffer.rsrc call? I figure it'll help make it more obvious what's going on.

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fundamental issues need to be resolved first.

@@ -1355,6 +1355,12 @@ struct InformationCache {
/// Return the flat address space if the associated target has.
LLVM_ABI std::optional<unsigned> getFlatAddressSpace() const;

virtual bool shouldTrackUse(const AbstractAttribute *QueryingAA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shiltian , this is used to forward propagate (which is in initialization of each AAAlign).

When calling followUsesInMBEC if this returns true, it will put user's user into the list for checking known alignment

[](const IRPosition &IRP, const AbstractAttribute *AA,
SmallVectorImpl<AA::ValueAndContext> &Values) {
if (auto *I = dyn_cast<Instruction>(&IRP.getAssociatedValue()))
if (isAlignAndMakeBuffer(AA, I)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to check whether it is AAAign here since this CB is only for align.


SmallVector<AlignmentCallbackTy, 1>
getAlignmentCallback(const IRPosition &IRP) {
return AlignmentCallBacks.lookup(IRP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the lookup fails? I'd prefer to do similar style as the simplification CB.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description should be more clear about what this is. This isn't adding anything to the intrinsic itself.

I think the only thing special about this case is it's an intrinsic that propagates a pointer operand, similar to llvm.ptrmask. It looks like AAAlign also fails to propagate the alignment (which may have been improved by the ptrmask). Can you start by adding support for llvm.ptrmask, and then generalizing the support to other pointer intrinsics?

@@ -138,6 +138,18 @@ static bool funcRequiresHostcallPtr(const Function &F) {
F.hasFnAttribute(Attribute::SanitizeMemTag);
}

static bool isAlignAndMakeBuffer(const AbstractAttribute *AA,
const Instruction *I) {
if (isa<AAAlign>(AA)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably shouldn't need to ever identify AAs like this?

@@ -1381,7 +1399,7 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
&AAAMDMaxNumWorkgroups::ID, &AAAMDWavesPerEU::ID, &AAAMDGPUNoAGPR::ID,
&AACallEdges::ID, &AAPointerInfo::ID, &AAPotentialConstantValues::ID,
&AAUnderlyingObjects::ID, &AAAddressSpace::ID, &AAIndirectCallInfo::ID,
&AAInstanceInfo::ID});
&AAInstanceInfo::ID, &AAAlign::ID});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this to AMDGPUAttributor should be a separate patch, you can do this in the base attributor first

Copy link
Contributor

@shiltian shiltian Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. you can add support for some generic LLVM intrinsics and test it.

@krzysz00
Copy link
Contributor

Good point, seconding "let's teach AAAlign about ptrmask"

@Shoreshen
Copy link
Contributor Author

Shoreshen commented Jun 25, 2025

This seems reasonable overall though i'll leave final review to folks familiar with the attributor.

One question: how much work would it be to materialize the returned alignment as a attribute on the make.buffer.rsrc call? I figure it'll help make it more obvious what's going on.

Hi @krzysz00 do you mean add alignment attribute on the call to intrinsic??

@Shoreshen
Copy link
Contributor Author

Shoreshen commented Jun 25, 2025

Description should be more clear about what this is. This isn't adding anything to the intrinsic itself.

I think the only thing special about this case is it's an intrinsic that propagates a pointer operand, similar to llvm.ptrmask. It looks like AAAlign also fails to propagate the alignment (which may have been improved by the ptrmask). Can you start by adding support for llvm.ptrmask, and then generalizing the support to other pointer intrinsics?

Hi @arsenm , to support llvm.ptrmask need to update AAAlign's ability.

Currently the only thing we can do is to tell AAAlign that the alignment of the call is the same to some operand of the call.

By my understanding it is little bit different with llvm.ptrmask, for %ret = call ptr addrspace(7) @llvm.ptrmask.p7.i32(ptr addrspace(7) %p, i32 -4) it means ret = p & (-4), this need calculation based on the 2 operand......

@arsenm
Copy link
Contributor

arsenm commented Jun 25, 2025

ptrmask shouldn't be any different than getelementptr: https://godbolt.org/z/nbe4x7qnd

It requires propagating the source attribute, and then interpreting it through the instruction to reach the return

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jun 25, 2025
SmallVector<AA::ValueAndContext> Values;
const auto &AligmentCBs = A.getAlignmentCallback(getIRPosition());

if (!AligmentCBs.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since lookup will do a default construct, you don't need to check emptiness here.

Comment on lines 5512 to 5514
for (const auto &CB : AligmentCBs) {
CB(getIRPosition(), this, Values);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (const auto &CB : AligmentCBs) {
CB(getIRPosition(), this, Values);
}
for (const auto &CB : AligmentCBs)
CB(getIRPosition(), this, Values);

@@ -5500,7 +5504,34 @@ struct AAAlignCallSiteReturned final
using Base = AACalleeToCallSite<AAAlign, AAAlignImpl>;
AAAlignCallSiteReturned(const IRPosition &IRP, Attributor &A)
: Base(IRP, A) {}
ChangeStatus updateImpl(Attributor &A) override {
SmallVector<AA::ValueAndContext> Values;
const auto &AligmentCBs = A.getAlignmentCallback(getIRPosition());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no auto


if (!Values.empty()) {
StateType T;
for (const auto &VAC : Values) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no auto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants