-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Cleanup and refactor atomic lowering #133781
[NVPTX] Cleanup and refactor atomic lowering #133781
Conversation
@llvm/pr-subscribers-backend-nvptx Author: Alex MacLean (AlexMaclean) ChangesCleanup lowering of atomic instructions and intrninsics. The TableGen changes are primarily a refactor, though sub variants are now lowered via operation legalization, potentially allowing for more DAG optimization. Patch is 48.91 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133781.diff 4 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 8a4b83365ae84..da604ae8b0c17 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -994,6 +994,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
setOperationAction(ISD::ADDRSPACECAST, {MVT::i32, MVT::i64}, Custom);
+
+ setOperationAction(ISD::ATOMIC_LOAD_SUB, {MVT::i32, MVT::i64}, Expand);
// No FPOW or FREM in PTX.
// Now deduce the information based on the above mentioned
diff --git a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
index fe9bb621b481c..7d0c47fa464c5 100644
--- a/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
+++ b/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
@@ -216,16 +216,25 @@ class fpimm_pos_inf<ValueType vt>
// Utility class to wrap up information about a register and DAG type for more
// convenient iteration and parameterization
-class RegTyInfo<ValueType ty, NVPTXRegClass rc, Operand imm> {
+class RegTyInfo<ValueType ty, NVPTXRegClass rc, Operand imm, SDNode imm_node,
+ bit supports_imm = 1> {
ValueType Ty = ty;
NVPTXRegClass RC = rc;
Operand Imm = imm;
+ SDNode ImmNode = imm_node;
+ bit SupportsImm = supports_imm;
int Size = ty.Size;
}
-def I16RT : RegTyInfo<i16, Int16Regs, i16imm>;
-def I32RT : RegTyInfo<i32, Int32Regs, i32imm>;
-def I64RT : RegTyInfo<i64, Int64Regs, i64imm>;
+def I16RT : RegTyInfo<i16, Int16Regs, i16imm, imm>;
+def I32RT : RegTyInfo<i32, Int32Regs, i32imm, imm>;
+def I64RT : RegTyInfo<i64, Int64Regs, i64imm, imm>;
+
+def F32RT : RegTyInfo<f32, Float32Regs, f32imm, fpimm>;
+def F64RT : RegTyInfo<f64, Float64Regs, f64imm, fpimm>;
+def F16RT : RegTyInfo<f16, Int16Regs, f16imm, fpimm, supports_imm = 0>;
+def BF16RT : RegTyInfo<bf16, Int16Regs, bf16imm, fpimm, supports_imm = 0>;
+
// Template for instructions which take three int64, int32, or int16 args.
// The instructions are named "<OpcStr><Width>" (e.g. "add.s64").
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index b2e05a567b4fe..a0a6cdcaafc2a 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -1975,529 +1975,135 @@ def INT_FNS_iii : INT_FNS_MBO<(ins i32imm:$mask, i32imm:$base, i32imm:$
// Atomic Functions
//-----------------------------------
-class ATOMIC_GLOBAL_CHK <dag ops, dag frag>
- : PatFrag<ops, frag, AS_match.global>;
-class ATOMIC_SHARED_CHK <dag ops, dag frag>
- : PatFrag<ops, frag, AS_match.shared>;
-class ATOMIC_GENERIC_CHK <dag ops, dag frag>
- : PatFrag<ops, frag, AS_match.generic>;
-
-multiclass F_ATOMIC_2<
- ValueType regT, NVPTXRegClass regclass,
- string SpaceStr, string TypeStr, string OpcStr, PatFrag IntOp,
- Operand IMMType, SDNode IMM, list<Predicate> Pred = []> {
- let mayLoad = 1, mayStore = 1, hasSideEffects = 1 in {
- def r : NVPTXInst<(outs regclass:$dst), (ins ADDR:$addr, regclass:$b),
- "atom" # SpaceStr # OpcStr # TypeStr # " \t$dst, [$addr], $b;",
- [(set (regT regclass:$dst), (IntOp addr:$addr, (regT regclass:$b)))]>,
- Requires<Pred>;
- if !not(!or(!eq(TypeStr, ".f16"), !eq(TypeStr, ".bf16"))) then
- def i : NVPTXInst<(outs regclass:$dst), (ins ADDR:$addr, IMMType:$b),
- "atom" # SpaceStr # OpcStr # TypeStr # " \t$dst, [$addr], $b;",
- [(set (regT regclass:$dst), (IntOp addr:$addr, IMM:$b))]>,
- Requires<Pred>;
- }
-}
+class ATOMIC_GLOBAL_CHK <dag frag>
+ : PatFrag<!setdagop(frag, ops), frag, AS_match.global>;
+class ATOMIC_SHARED_CHK <dag frag>
+ : PatFrag<!setdagop(frag, ops), frag, AS_match.shared>;
+class ATOMIC_GENERIC_CHK <dag frag>
+ : PatFrag<!setdagop(frag, ops), frag, AS_match.generic>;
+
-// has 2 operands, neg the second one
-multiclass F_ATOMIC_2_NEG<
- ValueType regT, NVPTXRegClass regclass,
- string SpaceStr, string TypeStr, string OpcStr, PatFrag IntOp,
- list<Predicate> Pred = []> {
+multiclass F_ATOMIC_2<RegTyInfo t, string sem_str, string as_str, string op_str,
+ SDPatternOperator op, list<Predicate> preds> {
+ defvar asm_str = "atom" # sem_str # as_str # "." # op_str # " \t$dst, [$addr], $b;";
let mayLoad = 1, mayStore = 1, hasSideEffects = 1 in {
- def reg : NVPTXInst<(outs regclass:$dst), (ins ADDR:$addr, regclass:$b),
- !strconcat(
- "{{ \n\t",
- ".reg \t.s", TypeStr, " temp; \n\t",
- "neg.s", TypeStr, " \ttemp, $b; \n\t",
- "atom", SpaceStr, OpcStr, ".u", TypeStr, " \t$dst, [$addr], temp; \n\t",
- "}}"),
- [(set (regT regclass:$dst), (IntOp addr:$addr, (regT regclass:$b)))]>,
- Requires<Pred>;
+ def r : NVPTXInst<(outs t.RC:$dst), (ins ADDR:$addr, t.RC:$b),
+ asm_str,
+ [(set t.Ty:$dst, (op addr:$addr, t.Ty:$b))]>,
+ Requires<preds>;
+ if t.SupportsImm then
+ def i : NVPTXInst<(outs t.RC:$dst), (ins ADDR:$addr, t.Imm:$b),
+ asm_str,
+ [(set t.Ty:$dst, (op addr:$addr, (t.Ty t.ImmNode:$b)))]>,
+ Requires<preds>;
}
}
// has 3 operands
-multiclass F_ATOMIC_3<
- ValueType regT, NVPTXRegClass regclass, string SemStr,
- string SpaceStr, string TypeStr, string OpcStr, PatFrag IntOp,
- Operand IMMType, list<Predicate> Pred = []> {
+multiclass F_ATOMIC_3<RegTyInfo t, string sem_str, string as_str, string op_str,
+ SDPatternOperator op, list<Predicate> preds> {
+ defvar asm_str = "atom" # sem_str # as_str # "." # op_str # " \t$dst, [$addr], $b, $c;";
let mayLoad = 1, mayStore = 1, hasSideEffects = 1 in {
- def rr : NVPTXInst<(outs regclass:$dst),
- (ins ADDR:$addr, regclass:$b, regclass:$c),
- "atom" # SemStr # SpaceStr # OpcStr # TypeStr # " \t$dst, [$addr], $b, $c;",
- [(set (regT regclass:$dst), (IntOp addr:$addr, regT:$b, regT:$c))]>,
- Requires<Pred>;
-
- def ir : NVPTXInst<(outs regclass:$dst),
- (ins ADDR:$addr, IMMType:$b, regclass:$c),
- "atom" # SemStr # SpaceStr # OpcStr # TypeStr # " \t$dst, [$addr], $b, $c;",
- [(set (regT regclass:$dst), (IntOp addr:$addr, imm:$b, regT:$c))]>,
- Requires<Pred>;
-
- def ri : NVPTXInst<(outs regclass:$dst),
- (ins ADDR:$addr, regclass:$b, IMMType:$c),
- "atom" # SemStr # SpaceStr # OpcStr # TypeStr # " \t$dst, [$addr], $b, $c;",
- [(set (regT regclass:$dst), (IntOp addr:$addr, regT:$b, imm:$c))]>,
- Requires<Pred>;
-
- def ii : NVPTXInst<(outs regclass:$dst),
- (ins ADDR:$addr, IMMType:$b, IMMType:$c),
- "atom" # SemStr # SpaceStr # OpcStr # TypeStr # " \t$dst, [$addr], $b, $c;",
- [(set (regT regclass:$dst), (IntOp addr:$addr, imm:$b, imm:$c))]>,
- Requires<Pred>;
+ def rr : NVPTXInst<(outs t.RC:$dst),
+ (ins ADDR:$addr, t.RC:$b, t.RC:$c),
+ asm_str,
+ [(set t.Ty:$dst, (op addr:$addr, t.Ty:$b, t.Ty:$c))]>,
+ Requires<preds>;
+
+ def ir : NVPTXInst<(outs t.RC:$dst),
+ (ins ADDR:$addr, t.Imm:$b, t.RC:$c),
+ asm_str,
+ [(set t.Ty:$dst, (op addr:$addr, (t.Ty t.ImmNode:$b), t.Ty:$c))]>,
+ Requires<preds>;
+
+ def ri : NVPTXInst<(outs t.RC:$dst),
+ (ins ADDR:$addr, t.RC:$b, t.Imm:$c),
+ asm_str,
+ [(set t.Ty:$dst, (op addr:$addr, t.Ty:$b, (t.Ty t.ImmNode:$c)))]>,
+ Requires<preds>;
+
+ def ii : NVPTXInst<(outs t.RC:$dst),
+ (ins ADDR:$addr, t.Imm:$b, t.Imm:$c),
+ asm_str,
+ [(set t.Ty:$dst, (op addr:$addr, (t.Ty t.ImmNode:$b), (t.Ty t.ImmNode:$c)))]>,
+ Requires<preds>;
}
}
+multiclass F_ATOMIC_2_AS<RegTyInfo t, SDPatternOperator frag, string op_str, list<Predicate> preds = []> {
+ defvar frag_pat = (frag node:$a, node:$b);
+ defm _G : F_ATOMIC_2<t, "", ".global", op_str, ATOMIC_GLOBAL_CHK<frag_pat>, preds>;
+ defm _S : F_ATOMIC_2<t, "", ".shared", op_str, ATOMIC_SHARED_CHK<frag_pat>, preds>;
+ defm _GEN : F_ATOMIC_2<t, "", "", op_str, ATOMIC_GENERIC_CHK<frag_pat>, preds>;
+}
+
+multiclass F_ATOMIC_3_AS<RegTyInfo t, SDPatternOperator frag, string sem_str, string op_str, list<Predicate> preds = []> {
+ defvar frag_pat = (frag node:$a, node:$b, node:$c);
+ defm _G : F_ATOMIC_3<t, sem_str, ".global", op_str, ATOMIC_GLOBAL_CHK<frag_pat>, preds>;
+ defm _S : F_ATOMIC_3<t, sem_str, ".shared", op_str, ATOMIC_SHARED_CHK<frag_pat>, preds>;
+ defm _GEN : F_ATOMIC_3<t, sem_str, "", op_str, ATOMIC_GENERIC_CHK<frag_pat>, preds>;
+}
+
// atom_add
+defm INT_PTX_ATOM_ADD_32 : F_ATOMIC_2_AS<I32RT, atomic_load_add_i32, "add.u32">;
+defm INT_PTX_ATOM_ADD_64 : F_ATOMIC_2_AS<I64RT, atomic_load_add_i64, "add.u64">;
-def atomic_load_add_i32_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_load_add_i32 node:$a, node:$b)>;
-def atomic_load_add_i32_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_add_i32 node:$a, node:$b)>;
-def atomic_load_add_i32_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_load_add_i32 node:$a, node:$b)>;
-def atomic_load_add_i64_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_load_add_i64 node:$a, node:$b)>;
-def atomic_load_add_i64_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_add_i64 node:$a, node:$b)>;
-def atomic_load_add_i64_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_load_add_i64 node:$a, node:$b)>;
-def atomic_load_add_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_load_fadd node:$a, node:$b)>;
-def atomic_load_add_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_fadd node:$a, node:$b)>;
-def atomic_load_add_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_load_fadd node:$a, node:$b)>;
-
-defm INT_PTX_ATOM_ADD_G_32 : F_ATOMIC_2<i32, Int32Regs, ".global", ".u32", ".add",
- atomic_load_add_i32_g, i32imm, imm>;
-defm INT_PTX_ATOM_ADD_S_32 : F_ATOMIC_2<i32, Int32Regs, ".shared", ".u32", ".add",
- atomic_load_add_i32_s, i32imm, imm>;
-defm INT_PTX_ATOM_ADD_GEN_32 : F_ATOMIC_2<i32, Int32Regs, "", ".u32", ".add",
- atomic_load_add_i32_gen, i32imm, imm>;
-defm INT_PTX_ATOM_ADD_GEN_32_USE_G : F_ATOMIC_2<i32, Int32Regs, ".global", ".u32",
- ".add", atomic_load_add_i32_gen, i32imm, imm>;
-
-defm INT_PTX_ATOM_ADD_G_64 : F_ATOMIC_2<i64, Int64Regs, ".global", ".u64", ".add",
- atomic_load_add_i64_g, i64imm, imm>;
-defm INT_PTX_ATOM_ADD_S_64 : F_ATOMIC_2<i64, Int64Regs, ".shared", ".u64", ".add",
- atomic_load_add_i64_s, i64imm, imm>;
-defm INT_PTX_ATOM_ADD_GEN_64 : F_ATOMIC_2<i64, Int64Regs, "", ".u64", ".add",
- atomic_load_add_i64_gen, i64imm, imm>;
-defm INT_PTX_ATOM_ADD_GEN_64_USE_G : F_ATOMIC_2<i64, Int64Regs, ".global", ".u64",
- ".add", atomic_load_add_i64_gen, i64imm, imm>;
-
-defm INT_PTX_ATOM_ADD_G_F16 : F_ATOMIC_2<f16, Int16Regs, ".global", ".f16", ".add.noftz",
- atomic_load_add_g, f16imm, fpimm, [hasSM<70>, hasPTX<63>]>;
-defm INT_PTX_ATOM_ADD_S_F16 : F_ATOMIC_2<f16, Int16Regs, ".shared", ".f16", ".add.noftz",
- atomic_load_add_s, f16imm, fpimm, [hasSM<70>, hasPTX<63>]>;
-defm INT_PTX_ATOM_ADD_GEN_F16 : F_ATOMIC_2<f16, Int16Regs, "", ".f16", ".add.noftz",
- atomic_load_add_gen, f16imm, fpimm, [hasSM<70>, hasPTX<63>]>;
-
-defm INT_PTX_ATOM_ADD_G_BF16 : F_ATOMIC_2<bf16, Int16Regs, ".global", ".bf16", ".add.noftz",
- atomic_load_add_g, bf16imm, fpimm, [hasSM<90>, hasPTX<78>]>;
-defm INT_PTX_ATOM_ADD_S_BF16 : F_ATOMIC_2<bf16, Int16Regs, ".shared", ".bf16", ".add.noftz",
- atomic_load_add_s, bf16imm, fpimm, [hasSM<90>, hasPTX<78>]>;
-defm INT_PTX_ATOM_ADD_GEN_BF16 : F_ATOMIC_2<bf16, Int16Regs, "", ".bf16", ".add.noftz",
- atomic_load_add_gen, bf16imm, fpimm, [hasSM<90>, hasPTX<78>]>;
-
-defm INT_PTX_ATOM_ADD_G_F32 : F_ATOMIC_2<f32, Float32Regs, ".global", ".f32", ".add",
- atomic_load_add_g, f32imm, fpimm>;
-defm INT_PTX_ATOM_ADD_S_F32 : F_ATOMIC_2<f32, Float32Regs, ".shared", ".f32", ".add",
- atomic_load_add_s, f32imm, fpimm>;
-defm INT_PTX_ATOM_ADD_GEN_F32 : F_ATOMIC_2<f32, Float32Regs, "", ".f32", ".add",
- atomic_load_add_gen, f32imm, fpimm>;
-
-defm INT_PTX_ATOM_ADD_G_F64 : F_ATOMIC_2<f64, Float64Regs, ".global", ".f64", ".add",
- atomic_load_add_g, f64imm, fpimm, [hasAtomAddF64]>;
-defm INT_PTX_ATOM_ADD_S_F64 : F_ATOMIC_2<f64, Float64Regs, ".shared", ".f64", ".add",
- atomic_load_add_s, f64imm, fpimm, [hasAtomAddF64]>;
-defm INT_PTX_ATOM_ADD_GEN_F64 : F_ATOMIC_2<f64, Float64Regs, "", ".f64", ".add",
- atomic_load_add_gen, f64imm, fpimm, [hasAtomAddF64]>;
-
-// atom_sub
-
-def atomic_load_sub_i32_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_load_sub_i32 node:$a, node:$b)>;
-def atomic_load_sub_i32_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_sub_i32 node:$a, node:$b)>;
-def atomic_load_sub_i32_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_load_sub_i32 node:$a, node:$b)>;
-def atomic_load_sub_i64_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_load_sub_i64 node:$a, node:$b)>;
-def atomic_load_sub_i64_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_sub_i64 node:$a, node:$b)>;
-def atomic_load_sub_i64_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_load_sub_i64 node:$a, node:$b)>;
-
-defm INT_PTX_ATOM_SUB_G_32 : F_ATOMIC_2_NEG<i32, Int32Regs, ".global", "32", ".add",
- atomic_load_sub_i32_g>;
-defm INT_PTX_ATOM_SUB_G_64 : F_ATOMIC_2_NEG<i64, Int64Regs, ".global", "64", ".add",
- atomic_load_sub_i64_g>;
-defm INT_PTX_ATOM_SUB_GEN_32 : F_ATOMIC_2_NEG<i32, Int32Regs, "", "32", ".add",
- atomic_load_sub_i32_gen>;
-defm INT_PTX_ATOM_SUB_GEN_32_USE_G : F_ATOMIC_2_NEG<i32, Int32Regs, ".global", "32",
- ".add", atomic_load_sub_i32_gen>;
-defm INT_PTX_ATOM_SUB_S_32 : F_ATOMIC_2_NEG<i32, Int32Regs, ".shared", "32", ".add",
- atomic_load_sub_i32_s>;
-defm INT_PTX_ATOM_SUB_S_64 : F_ATOMIC_2_NEG<i64, Int64Regs, ".shared", "64", ".add",
- atomic_load_sub_i64_s>;
-defm INT_PTX_ATOM_SUB_GEN_64 : F_ATOMIC_2_NEG<i64, Int64Regs, "", "64", ".add",
- atomic_load_sub_i64_gen>;
-defm INT_PTX_ATOM_SUB_GEN_64_USE_G : F_ATOMIC_2_NEG<i64, Int64Regs, ".global", "64",
- ".add", atomic_load_sub_i64_gen>;
+defm INT_PTX_ATOM_ADD_F16 : F_ATOMIC_2_AS<F16RT, atomic_load_fadd, "add.noftz.f16", [hasSM<70>, hasPTX<63>]>;
+defm INT_PTX_ATOM_ADD_BF16 : F_ATOMIC_2_AS<BF16RT, atomic_load_fadd, "add.noftz.bf16", [hasSM<90>, hasPTX<78>]>;
+defm INT_PTX_ATOM_ADD_F32 : F_ATOMIC_2_AS<F32RT, atomic_load_fadd, "add.f32">;
+defm INT_PTX_ATOM_ADD_F64 : F_ATOMIC_2_AS<F64RT, atomic_load_fadd, "add.f64", [hasAtomAddF64]>;
// atom_swap
-
-def atomic_swap_i32_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_swap_i32 node:$a, node:$b)>;
-def atomic_swap_i32_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_swap_i32 node:$a, node:$b)>;
-def atomic_swap_i32_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_swap_i32 node:$a, node:$b)>;
-def atomic_swap_i64_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_swap_i64 node:$a, node:$b)>;
-def atomic_swap_i64_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_swap_i64 node:$a, node:$b)>;
-def atomic_swap_i64_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_swap_i64 node:$a, node:$b)>;
-
-defm INT_PTX_ATOM_SWAP_G_32 : F_ATOMIC_2<i32, Int32Regs, ".global", ".b32", ".exch",
- atomic_swap_i32_g, i32imm, imm>;
-defm INT_PTX_ATOM_SWAP_S_32 : F_ATOMIC_2<i32, Int32Regs, ".shared", ".b32", ".exch",
- atomic_swap_i32_s, i32imm, imm>;
-defm INT_PTX_ATOM_SWAP_GEN_32 : F_ATOMIC_2<i32, Int32Regs, "", ".b32", ".exch",
- atomic_swap_i32_gen, i32imm, imm>;
-defm INT_PTX_ATOM_SWAP_GEN_32_USE_G : F_ATOMIC_2<i32, Int32Regs, ".global", ".b32",
- ".exch", atomic_swap_i32_gen, i32imm, imm>;
-defm INT_PTX_ATOM_SWAP_G_64 : F_ATOMIC_2<i64, Int64Regs, ".global", ".b64", ".exch",
- atomic_swap_i64_g, i64imm, imm>;
-defm INT_PTX_ATOM_SWAP_S_64 : F_ATOMIC_2<i64, Int64Regs, ".shared", ".b64", ".exch",
- atomic_swap_i64_s, i64imm, imm>;
-defm INT_PTX_ATOM_SWAP_GEN_64 : F_ATOMIC_2<i64, Int64Regs, "", ".b64", ".exch",
- atomic_swap_i64_gen, i64imm, imm>;
-defm INT_PTX_ATOM_SWAP_GEN_64_USE_G : F_ATOMIC_2<i64, Int64Regs, ".global", ".b64",
- ".exch", atomic_swap_i64_gen, i64imm, imm>;
+defm INT_PTX_ATOM_SWAP_32 : F_ATOMIC_2_AS<I32RT, atomic_swap_i32, "exch.b32">;
+defm INT_PTX_ATOM_SWAP_64 : F_ATOMIC_2_AS<I64RT, atomic_swap_i64, "exch.b64">;
// atom_max
-
-def atomic_load_max_i32_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b)
- , (atomic_load_max_i32 node:$a, node:$b)>;
-def atomic_load_max_i32_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_max_i32 node:$a, node:$b)>;
-def atomic_load_max_i32_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_load_max_i32 node:$a, node:$b)>;
-def atomic_load_max_i64_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b)
- , (atomic_load_max_i64 node:$a, node:$b)>;
-def atomic_load_max_i64_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_max_i64 node:$a, node:$b)>;
-def atomic_load_max_i64_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_load_max_i64 node:$a, node:$b)>;
-def atomic_load_umax_i32_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_load_umax_i32 node:$a, node:$b)>;
-def atomic_load_umax_i32_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_umax_i32 node:$a, node:$b)>;
-def atomic_load_umax_i32_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_load_umax_i32 node:$a, node:$b)>;
-def atomic_load_umax_i64_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_load_umax_i64 node:$a, node:$b)>;
-def atomic_load_umax_i64_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_umax_i64 node:$a, node:$b)>;
-def atomic_load_umax_i64_gen: ATOMIC_GENERIC_CHK<(ops node:$a, node:$b),
- (atomic_load_umax_i64 node:$a, node:$b)>;
-
-defm INT_PTX_ATOM_LOAD_MAX_G_32 : F_ATOMIC_2<i32, Int32Regs, ".global", ".s32",
- ".max", atomic_load_max_i32_g, i32imm, imm>;
-defm INT_PTX_ATOM_LOAD_MAX_S_32 : F_ATOMIC_2<i32, Int32Regs, ".shared", ".s32",
- ".max", atomic_load_max_i32_s, i32imm, imm>;
-defm INT_PTX_ATOM_LOAD_MAX_GEN_32 : F_ATOMIC_2<i32, Int32Regs, "", ".s32", ".max",
- atomic_load_max_i32_gen, i32imm, imm>;
-defm INT_PTX_ATOM_LOAD_MAX_GEN_32_USE_G : F_ATOMIC_2<i32, Int32Regs, ".global",
- ".s32", ".max", atomic_load_max_i32_gen, i32imm, imm>;
-defm INT_PTX_ATOM_LOAD_MAX_G_64 : F_ATOMIC_2<i64, Int64Regs, ".global", ".s64",
- ".max", atomic_load_max_i64_g, i64imm, imm, [hasSM<32>]>;
-defm INT_PTX_ATOM_LOAD_MAX_S_64 : F_ATOMIC_2<i64, Int64Regs, ".shared", ".s64",
- ".max", atomic_load_max_i64_s, i64imm, imm, [hasSM<32>]>;
-defm INT_PTX_ATOM_LOAD_MAX_GEN_64 : F_ATOMIC_2<i64, Int64Regs, "", ".s64", ".max",
- atomic_load_max_i64_gen, i64imm, imm, [hasSM<32>]>;
-defm INT_PTX_ATOM_LOAD_MAX_GEN_64_USE_G : F_ATOMIC_2<i64, Int64Regs, ".global",
- ".s64", ".max", atomic_load_max_i64_gen, i64imm, imm, [hasSM<32>]>;
-defm INT_PTX_ATOM_LOAD_UMAX_G_32 : F_ATOMIC_2<i32, Int32Regs, ".global", ".u32",
- ".max", atomic_load_umax_i32_g, i32imm, imm>;
-defm INT_PTX_ATOM_LOAD_UMAX_S_32 : F_ATOMIC_2<i32, Int32Regs, ".shared", ".u32",
- ".max", atomic_load_umax_i32_s, i32imm, imm>;
-defm INT_PTX_ATOM_LOAD_UMAX_GEN_32 : F_ATOMIC_2<i32, Int32Regs, "", ".u32", ".max",
- atomic_load_umax_i32_gen, i32imm, imm>;
-defm INT_PTX_ATOM_LOAD_UMAX_GEN_32_USE_G : F_ATOMIC_2<i32, Int32Regs, ".global",
- ".u32", ".max", atomic_load_umax_i32_gen, i32imm, imm>;
-defm INT_PTX_ATOM_LOAD_UMAX_G_64 : F_ATOMIC_2<i64, Int64Regs, ".global", ".u64",
- ".max", atomic_load_umax_i64_g, i64imm, imm, [hasSM<32>]>;
-defm INT_PTX_ATOM_LOAD_UMAX_S_64 : F_ATOMIC_2<i64, Int64Regs, ".shared", ".u64",
- ".max", atomic_load_umax_i64_s, i64imm, imm, [hasSM<32>]>;
-defm INT_PTX_ATOM_LOAD_UMAX_GEN_64 : F_ATOMIC_2<i64, Int64Regs, "", ".u64", ".max",
- atomic_load_umax_i64_gen, i64imm, imm, [hasSM<32>]>;
-defm INT_PTX_ATOM_LOAD_UMAX_GEN_64_USE_G : F_ATOMIC_2<i64, Int64Regs, ".global",
- ".u64", ".max", atomic_load_umax_i64_gen, i64imm, imm, [hasSM<32>]>;
+defm INT_PTX_ATOMIC_MAX_32 : F_ATOMIC_2_AS<I32RT, atomic_load_max_i32, "max.s32">;
+defm INT_PTX_ATOMIC_MAX_64 : F_ATOMIC_2_AS<I64RT, atomic_load_max_i64, "max.s64", [hasSM<32>]>;
+defm INT_PTX_ATOMIC_UMAX_32 : F_ATOMIC_2_AS<I32RT, atomic_load_umax_i32, "max.u32">;
+defm INT_PTX_ATOMIC_UMAX_64 : F_ATOMIC_2_AS<I64RT, atomic_load_umax_i64, "max.u64", [hasSM<32>]>;
// atom_min
-
-def atomic_load_min_i32_g: ATOMIC_GLOBAL_CHK<(ops node:$a, node:$b),
- (atomic_load_min_i32 node:$a, node:$b)>;
-def atomic_load_min_i32_s: ATOMIC_SHARED_CHK<(ops node:$a, node:$b),
- (atomic_load_min_i32 node:$a, node:$b)>;
-def atomic_load_min_i32_gen: ATOMIC_GENERI...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
eae00bd
to
c7f7a2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I like the reduction of the boilerplate.
LGTM! Thanks for getting rid of the hand-written instruction definitions. |
"", cas_addrspace_string, []>; | ||
} | ||
foreach t = [I32RT, I64RT] in { | ||
foreach order = ["acquire", "release", "acq_rel", "monotonic"] in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are seq_cst
handled somewhere? (EDIT: not a blocker, just curious)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on the next line says the following:
// Note that AtomicExpand will convert cmpxchg seq_cst to a cmpxchg monotonic with fences around it.
My understanding of atomic lowering is pretty surface level so you'd likely be a better person than me to assess whether this is true and the way we should be doing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTX does not support atom.seq_cst. Instead, cmpxchg seq_cst is emulated as "fence.sc; atom.cas"
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-atom
Speaking of atomics and vectors, we do have an implementation gap for atomics on 128-bit types like i128 or float4. |
Cleanup lowering of atomic instructions and intrninsics. The TableGen changes are primarily a refactor, though sub variants are now lowered via operation legalization, potentially allowing for more DAG optimization.
Cleanup lowering of atomic instructions and intrninsics. The TableGen changes are primarily a refactor, though sub variants are now lowered via operation legalization, potentially allowing for more DAG optimization.