-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[flang][do concurrent] Re-model reduce
to match reductions are modelled in OpenMP and OpenACC
#145837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…lled in OpenMP and OpenACC This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things.
@llvm/pr-subscribers-flang-fir-hlfir Author: Kareem Ergawy (ergawy) ChangesThis PR proposes re-modelling
Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the Patch is 79.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145837.diff 19 Files Affected:
diff --git a/flang/include/flang/Optimizer/Dialect/FIRAttr.td b/flang/include/flang/Optimizer/Dialect/FIRAttr.td
index 2845080030b92..7bd96ac3ea631 100644
--- a/flang/include/flang/Optimizer/Dialect/FIRAttr.td
+++ b/flang/include/flang/Optimizer/Dialect/FIRAttr.td
@@ -112,7 +112,7 @@ def fir_ReduceOperationEnum : I32BitEnumAttr<"ReduceOperationEnum",
I32BitEnumAttrCaseBit<"MIN", 7, "min">,
I32BitEnumAttrCaseBit<"IAND", 8, "iand">,
I32BitEnumAttrCaseBit<"IOR", 9, "ior">,
- I32BitEnumAttrCaseBit<"EIOR", 10, "eior">
+ I32BitEnumAttrCaseBit<"IEOR", 10, "ieor">
]> {
let separator = ", ";
let cppNamespace = "::fir";
diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td
index 8ac847dd7dd0a..e7eed633de710 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -3518,7 +3518,7 @@ def fir_BoxTotalElementsOp
def YieldOp : fir_Op<"yield",
[Pure, ReturnLike, Terminator,
- ParentOneOf<["LocalitySpecifierOp"]>]> {
+ ParentOneOf<["LocalitySpecifierOp", "DeclareReductionOp"]>]> {
let summary = "loop yield and termination operation";
let description = [{
"fir.yield" yields SSA values from a fir dialect op region and
@@ -3662,6 +3662,103 @@ def fir_LocalitySpecifierOp : fir_Op<"local", [IsolatedFromAbove]> {
let hasRegionVerifier = 1;
}
+def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove,
+ Symbol]> {
+ let summary = "declares a reduction kind";
+ let description = [{
+ Note: this operation is adapted from omp::DeclareReductionOp. There is a lot
+ duplication at the moment. TODO Combined both ops into one. See:
+ https://discourse.llvm.org/t/dialect-for-data-locality-sharing-specifiers-clauses-in-openmp-openacc-and-do-concurrent/86108.
+
+ Declares an `do concurrent` reduction. This requires two mandatory and three
+ optional regions.
+
+ 1. The optional alloc region specifies how to allocate the thread-local
+ reduction value. This region should not contain control flow and all
+ IR should be suitable for inlining straight into an entry block. In
+ the common case this is expected to contain only allocas. It is
+ expected to `fir.yield` the allocated value on all control paths.
+ If allocation is conditional (e.g. only allocate if the mold is
+ allocated), this should be done in the initilizer region and this
+ region not included. The alloc region is not used for by-value
+ reductions (where allocation is implicit).
+ 2. The initializer region specifies how to initialize the thread-local
+ reduction value. This is usually the neutral element of the reduction.
+ For convenience, the region has an argument that contains the value
+ of the reduction accumulator at the start of the reduction. If an alloc
+ region is specified, there is a second block argument containing the
+ address of the allocated memory. The initializer region is expected to
+ `fir.yield` the new value on all control flow paths.
+ 3. The reduction region specifies how to combine two values into one, i.e.
+ the reduction operator. It accepts the two values as arguments and is
+ expected to `fir.yield` the combined value on all control flow paths.
+ 4. The atomic reduction region is optional and specifies how two values
+ can be combined atomically given local accumulator variables. It is
+ expected to store the combined value in the first accumulator variable.
+ 5. The cleanup region is optional and specifies how to clean up any memory
+ allocated by the initializer region. The region has an argument that
+ contains the value of the thread-local reduction accumulator. This will
+ be executed after the reduction has completed.
+
+ Note that the MLIR type system does not allow for type-polymorphic
+ reductions. Separate reduction declarations should be created for different
+ element and accumulator types.
+
+ For initializer and reduction regions, the operand to `fir.yield` must
+ match the parent operation's results.
+ }];
+
+ let arguments = (ins SymbolNameAttr:$sym_name,
+ TypeAttr:$type);
+
+ let regions = (region MaxSizedRegion<1>:$allocRegion,
+ AnyRegion:$initializerRegion,
+ AnyRegion:$reductionRegion,
+ AnyRegion:$atomicReductionRegion,
+ AnyRegion:$cleanupRegion);
+
+ let assemblyFormat = "$sym_name `:` $type attr-dict-with-keyword "
+ "( `alloc` $allocRegion^ )? "
+ "`init` $initializerRegion "
+ "`combiner` $reductionRegion "
+ "( `atomic` $atomicReductionRegion^ )? "
+ "( `cleanup` $cleanupRegion^ )? ";
+
+ let extraClassDeclaration = [{
+ mlir::BlockArgument getAllocMoldArg() {
+ auto ®ion = getAllocRegion();
+ return region.empty() ? nullptr : region.getArgument(0);
+ }
+ mlir::BlockArgument getInitializerMoldArg() {
+ return getInitializerRegion().getArgument(0);
+ }
+ mlir::BlockArgument getInitializerAllocArg() {
+ return getAllocRegion().empty() ?
+ nullptr : getInitializerRegion().getArgument(1);
+ }
+ mlir::BlockArgument getReductionLhsArg() {
+ return getReductionRegion().getArgument(0);
+ }
+ mlir::BlockArgument getReductionRhsArg() {
+ return getReductionRegion().getArgument(1);
+ }
+ mlir::BlockArgument getAtomicReductionLhsArg() {
+ auto ®ion = getAtomicReductionRegion();
+ return region.empty() ? nullptr : region.getArgument(0);
+ }
+ mlir::BlockArgument getAtomicReductionRhsArg() {
+ auto ®ion = getAtomicReductionRegion();
+ return region.empty() ? nullptr : region.getArgument(1);
+ }
+ mlir::BlockArgument getCleanupAllocArg() {
+ auto ®ion = getCleanupRegion();
+ return region.empty() ? nullptr : region.getArgument(0);
+ }
+ }];
+
+ let hasRegionVerifier = 1;
+}
+
def fir_DoConcurrentOp : fir_Op<"do_concurrent",
[SingleBlock, AutomaticAllocationScope]> {
let summary = "do concurrent loop wrapper";
@@ -3700,6 +3797,25 @@ def fir_LocalSpecifier {
);
}
+def fir_ReduceSpecifier {
+ dag arguments = (ins
+ Variadic<AnyType>:$reduce_vars,
+ OptionalAttr<DenseBoolArrayAttr>:$reduce_byref,
+
+ // This introduces redundency in how reductions are modelled. In particular,
+ // a single reduction is represented by 2 attributes:
+ //
+ // 1. `$reduce_syms` which is a list of `DeclareReductionOp`s.
+ // 2. `$reduce_attrs` which is an array of `fir::ReduceAttr` values.
+ //
+ // The first makes it easier to map `do concurrent` to parallization models
+ // (e.g. OpenMP and OpenACC) while the second makes it easier to map it to
+ // nests of `fir.do_loop ... unodered` ops.
+ OptionalAttr<SymbolRefArrayAttr>:$reduce_syms,
+ OptionalAttr<ArrayAttr>:$reduce_attrs
+ );
+}
+
def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
[AttrSizedOperandSegments, DeclareOpInterfaceMethods<LoopLikeOpInterface,
["getLoopInductionVars"]>,
@@ -3709,7 +3825,7 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
let description = [{
An operation that models a Fortran `do concurrent` loop's header and block.
This is a single-region single-block terminator op that is expected to
- terminate the region of a `omp.do_concurrent` wrapper op.
+ terminate the region of a `fir.do_concurrent` wrapper op.
This op borrows from both `scf.parallel` and `fir.do_loop` ops. Similar to
`scf.parallel`, a loop nest takes 3 groups of SSA values as operands that
@@ -3747,8 +3863,6 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
- `lowerBound`: The group of SSA values for the nest's lower bounds.
- `upperBound`: The group of SSA values for the nest's upper bounds.
- `step`: The group of SSA values for the nest's steps.
- - `reduceOperands`: The reduction SSA values, if any.
- - `reduceAttrs`: Attributes to store reduction operations, if any.
- `loopAnnotation`: Loop metadata to be passed down the compiler pipeline to
LLVM.
}];
@@ -3757,12 +3871,12 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
Variadic<Index>:$lowerBound,
Variadic<Index>:$upperBound,
Variadic<Index>:$step,
- Variadic<AnyType>:$reduceOperands,
- OptionalAttr<ArrayAttr>:$reduceAttrs,
OptionalAttr<LoopAnnotationAttr>:$loopAnnotation
);
- let arguments = !con(opArgs, fir_LocalSpecifier.arguments);
+ let arguments = !con(opArgs,
+ fir_LocalSpecifier.arguments,
+ fir_ReduceSpecifier.arguments);
let regions = (region SizedRegion<1>:$region);
@@ -3783,12 +3897,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
getNumLocalOperands());
}
+ mlir::Block::BlockArgListType getRegionReduceArgs() {
+ return getBody()->getArguments().slice(getNumInductionVars()
+ + getNumLocalOperands(),
+ getNumReduceOperands());
+ }
+
/// Number of operands controlling the loop
unsigned getNumControlOperands() { return getLowerBound().size() * 3; }
// Get Number of reduction operands
unsigned getNumReduceOperands() {
- return getReduceOperands().size();
+ return getReduceVars().size();
}
mlir::Operation::operand_range getLocalOperands() {
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 8506b9a984e58..7e8f09a293834 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -12,6 +12,7 @@
#include "flang/Lower/Bridge.h"
+#include "OpenMP/ReductionProcessor.h"
#include "flang/Lower/Allocatable.h"
#include "flang/Lower/CallInterface.h"
#include "flang/Lower/Coarray.h"
@@ -127,9 +128,8 @@ struct IncrementLoopInfo {
bool isConcurrent;
llvm::SmallVector<const Fortran::semantics::Symbol *> localSymList;
llvm::SmallVector<const Fortran::semantics::Symbol *> localInitSymList;
- llvm::SmallVector<
- std::pair<fir::ReduceOperationEnum, const Fortran::semantics::Symbol *>>
- reduceSymList;
+ llvm::SmallVector<const Fortran::semantics::Symbol *> reduceSymList;
+ llvm::SmallVector<fir::ReduceOperationEnum> reduceOperatorList;
llvm::SmallVector<const Fortran::semantics::Symbol *> sharedSymList;
mlir::Value loopVariable = nullptr;
@@ -1981,7 +1981,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
case Fortran::parser::ReductionOperator::Operator::Ior:
return fir::ReduceOperationEnum::IOR;
case Fortran::parser::ReductionOperator::Operator::Ieor:
- return fir::ReduceOperationEnum::EIOR;
+ return fir::ReduceOperationEnum::IEOR;
}
llvm_unreachable("illegal reduction operator");
}
@@ -2015,8 +2015,8 @@ class FirConverter : public Fortran::lower::AbstractConverter {
std::get<Fortran::parser::ReductionOperator>(reduceList->t));
for (const Fortran::parser::Name &x :
std::get<std::list<Fortran::parser::Name>>(reduceList->t)) {
- info.reduceSymList.push_back(
- std::make_pair(reduce_operation, x.symbol));
+ info.reduceSymList.push_back(x.symbol);
+ info.reduceOperatorList.push_back(reduce_operation);
}
}
}
@@ -2077,6 +2077,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
assign.u = Fortran::evaluate::Assignment::BoundsSpec{};
genAssignment(assign);
}
+
for (const Fortran::semantics::Symbol *sym : info.sharedSymList) {
const auto *hostDetails =
sym->detailsIf<Fortran::semantics::HostAssocDetails>();
@@ -2100,6 +2101,46 @@ class FirConverter : public Fortran::lower::AbstractConverter {
}
}
+ llvm::SmallVector<bool> reduceVarByRef;
+ llvm::SmallVector<mlir::Attribute> reductionDeclSymbols;
+ llvm::SmallVector<mlir::Attribute> nestReduceAttrs;
+
+ for (const auto &reduceOp : info.reduceOperatorList) {
+ nestReduceAttrs.push_back(
+ fir::ReduceAttr::get(builder->getContext(), reduceOp));
+ }
+
+ llvm::SmallVector<mlir::Value> reduceVars;
+ Fortran::lower::omp::ReductionProcessor rp;
+ rp.processReductionArguments<fir::DeclareReductionOp>(
+ toLocation(), *this, info.reduceOperatorList, reduceVars,
+ reduceVarByRef, reductionDeclSymbols, info.reduceSymList);
+
+ doConcurrentLoopOp.getReduceVarsMutable().assign(reduceVars);
+ doConcurrentLoopOp.setReduceSymsAttr(
+ reductionDeclSymbols.empty()
+ ? nullptr
+ : mlir::ArrayAttr::get(builder->getContext(),
+ reductionDeclSymbols));
+ doConcurrentLoopOp.setReduceAttrsAttr(
+ nestReduceAttrs.empty()
+ ? nullptr
+ : mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs));
+ doConcurrentLoopOp.setReduceByrefAttr(
+ reduceVarByRef.empty() ? nullptr
+ : mlir::DenseBoolArrayAttr::get(
+ builder->getContext(), reduceVarByRef));
+
+ for (auto [sym, reduceVar] :
+ llvm::zip_equal(info.reduceSymList, reduceVars)) {
+ auto arg = doConcurrentLoopOp.getRegion().begin()->addArgument(
+ reduceVar.getType(), doConcurrentLoopOp.getLoc());
+ bindSymbol(*sym, hlfir::translateToExtendedValue(
+ reduceVar.getLoc(), *builder, hlfir::Entity{arg},
+ /*contiguousHint=*/true)
+ .first);
+ }
+
// Note that allocatable, types with ultimate components, and type
// requiring finalization are forbidden in LOCAL/LOCAL_INIT (F2023 C1130),
// so no clean-up needs to be generated for these entities.
@@ -2191,6 +2232,12 @@ class FirConverter : public Fortran::lower::AbstractConverter {
}
}
+ // Introduce a `do concurrent` scope to bind symbols corresponding to local,
+ // local_init, and reduce region arguments.
+ if (!incrementLoopNestInfo.empty() &&
+ incrementLoopNestInfo.back().isConcurrent)
+ localSymbols.pushScope();
+
// Increment loop begin code. (Infinite/while code was already generated.)
if (!infiniteLoop && !whileCondition)
genFIRIncrementLoopBegin(incrementLoopNestInfo, doStmtEval.dirs);
@@ -2214,6 +2261,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
// This call may generate a branch in some contexts.
genFIR(endDoEval, unstructuredContext);
+
+ if (!incrementLoopNestInfo.empty() &&
+ incrementLoopNestInfo.back().isConcurrent)
+ localSymbols.popScope();
}
/// Generate FIR to evaluate loop control values (lower, upper and step).
@@ -2396,19 +2447,6 @@ class FirConverter : public Fortran::lower::AbstractConverter {
info.stepVariable = builder->createTemporary(loc, stepValue.getType());
builder->create<fir::StoreOp>(loc, stepValue, info.stepVariable);
}
-
- if (genDoConcurrent && nestReduceOperands.empty()) {
- // Create DO CONCURRENT reduce operands and attributes
- for (const auto &reduceSym : info.reduceSymList) {
- const fir::ReduceOperationEnum reduceOperation = reduceSym.first;
- const Fortran::semantics::Symbol *sym = reduceSym.second;
- fir::ExtendedValue exv = getSymbolExtendedValue(*sym, nullptr);
- nestReduceOperands.push_back(fir::getBase(exv));
- auto reduceAttr =
- fir::ReduceAttr::get(builder->getContext(), reduceOperation);
- nestReduceAttrs.push_back(reduceAttr);
- }
- }
}
for (auto [info, lowerValue, upperValue, stepValue] :
@@ -2506,11 +2544,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
builder->setInsertionPointToEnd(loopWrapperOp.getBody());
auto loopOp = builder->create<fir::DoConcurrentLoopOp>(
- loc, nestLBs, nestUBs, nestSts, nestReduceOperands,
- nestReduceAttrs.empty()
- ? nullptr
- : mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs),
- nullptr, /*local_vars=*/std::nullopt, /*local_syms=*/nullptr);
+ loc, nestLBs, nestUBs, nestSts, nullptr, /*local_vars=*/std::nullopt,
+ /*local_syms=*/nullptr, /*reduce_vars=*/std::nullopt,
+ /*reduce_byref=*/nullptr, /*reduce_syms=*/nullptr,
+ /*reduce_attrs=*/nullptr);
llvm::SmallVector<mlir::Type> loopBlockArgTypes(
incrementLoopNestInfo.size(), builder->getIndexType());
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index 7bea427099a28..5aebfc901e8ac 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -12,6 +12,7 @@
#include "ClauseProcessor.h"
#include "Clauses.h"
+#include "ReductionProcessor.h"
#include "Utils.h"
#include "flang/Lower/ConvertExprToHLFIR.h"
@@ -25,6 +26,21 @@ namespace Fortran {
namespace lower {
namespace omp {
+using ReductionModifier =
+ Fortran::lower::omp::clause::Reduction::ReductionModifier;
+
+mlir::omp::ReductionModifier translateReductionModifier(ReductionModifier mod) {
+ switch (mod) {
+ case ReductionModifier::Default:
+ return mlir::omp::ReductionModifier::defaultmod;
+ case ReductionModifier::Inscan:
+ return mlir::omp::ReductionModifier::inscan;
+ case ReductionModifier::Task:
+ return mlir::omp::ReductionModifier::task;
+ }
+ return mlir::omp::ReductionModifier::defaultmod;
+}
+
/// Check for unsupported map operand types.
static void checkMapType(mlir::Location location, mlir::Type type) {
if (auto refType = mlir::dyn_cast<fir::ReferenceType>(type))
@@ -1076,6 +1092,18 @@ bool ClauseProcessor::processIf(
});
return found;
}
+
+template <typename T>
+void collectReductionSyms(
+ const T &reduction,
+ llvm::SmallVectorImpl<const semantics::Symbol *> &reductionSyms) {
+ const auto &objectList{std::get<omp::ObjectList>(reduction.t)};
+ for (const Object &object : objectList) {
+ const semantics::Symbol *symbol = object.sym();
+ reductionSyms.push_back(symbol);
+ }
+}
+
bool ClauseProcessor::processInReduction(
mlir::Location currentLocation, mlir::omp::InReductionClauseOps &result,
llvm::SmallVectorImpl<const semantics::Symbol *> &outReductionSyms) const {
@@ -1085,10 +1113,14 @@ bool ClauseProcessor::processInReduction(
llvm::SmallVector<bool> inReduceVarByRef;
llvm::SmallVector<mlir::Attribute> inReductionDeclSymbols;
llvm::SmallVector<const semantics::Symbol *> inReductionSyms;
+ collectReductionSyms(clause, inReductionSyms);
+
ReductionProcessor rp;
- rp.processReductionArguments<omp::clause::InReduction>(
- currentLocation, converter, clause, inReductionVars,
- inReduceVarByRef, inReductionDeclSymbols, inReductionSyms);
+ rp.processReductionArguments<mlir::omp::DeclareReductionOp>(
+ currentLocation, converter,
+ std::get<typename omp::clause::ReductionOperatorList>(clause.t),
+ inReductionVars, inReduceVarByRef, inReductionDeclSymbols,
+ inReductionSyms);
// Copy local lists into the output.
llvm::copy(inReductionVars, std::back_inserter(result.inReductionVars));
@@ -1416,10 +1448,23 @@ bool ClauseProcessor::processReduction(
llvm::SmallVector<bool> reduceVarByRef;
llvm::SmallVector<mlir::Attribute> reductionDeclSymbols;
llvm::SmallVector<const semantics::Symbol *> reductionSyms;
+ collectReductionSyms(clause, reductionSyms);
+
+ auto mod = std::get<std::optional<ReductionModifier>>(clause.t);
+ if (mod.has_value()) {
+ if (mod.value() == ReductionModifier::Task)
+ TODO(currentLocation, "Reduction modifier `task` is not supported");
+ else
+ res...
[truncated]
|
@llvm/pr-subscribers-flang-codegen Author: Kareem Ergawy (ergawy) ChangesThis PR proposes re-modelling
Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the Patch is 79.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145837.diff 19 Files Affected:
diff --git a/flang/include/flang/Optimizer/Dialect/FIRAttr.td b/flang/include/flang/Optimizer/Dialect/FIRAttr.td
index 2845080030b92..7bd96ac3ea631 100644
--- a/flang/include/flang/Optimizer/Dialect/FIRAttr.td
+++ b/flang/include/flang/Optimizer/Dialect/FIRAttr.td
@@ -112,7 +112,7 @@ def fir_ReduceOperationEnum : I32BitEnumAttr<"ReduceOperationEnum",
I32BitEnumAttrCaseBit<"MIN", 7, "min">,
I32BitEnumAttrCaseBit<"IAND", 8, "iand">,
I32BitEnumAttrCaseBit<"IOR", 9, "ior">,
- I32BitEnumAttrCaseBit<"EIOR", 10, "eior">
+ I32BitEnumAttrCaseBit<"IEOR", 10, "ieor">
]> {
let separator = ", ";
let cppNamespace = "::fir";
diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td
index 8ac847dd7dd0a..e7eed633de710 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -3518,7 +3518,7 @@ def fir_BoxTotalElementsOp
def YieldOp : fir_Op<"yield",
[Pure, ReturnLike, Terminator,
- ParentOneOf<["LocalitySpecifierOp"]>]> {
+ ParentOneOf<["LocalitySpecifierOp", "DeclareReductionOp"]>]> {
let summary = "loop yield and termination operation";
let description = [{
"fir.yield" yields SSA values from a fir dialect op region and
@@ -3662,6 +3662,103 @@ def fir_LocalitySpecifierOp : fir_Op<"local", [IsolatedFromAbove]> {
let hasRegionVerifier = 1;
}
+def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove,
+ Symbol]> {
+ let summary = "declares a reduction kind";
+ let description = [{
+ Note: this operation is adapted from omp::DeclareReductionOp. There is a lot
+ duplication at the moment. TODO Combined both ops into one. See:
+ https://discourse.llvm.org/t/dialect-for-data-locality-sharing-specifiers-clauses-in-openmp-openacc-and-do-concurrent/86108.
+
+ Declares an `do concurrent` reduction. This requires two mandatory and three
+ optional regions.
+
+ 1. The optional alloc region specifies how to allocate the thread-local
+ reduction value. This region should not contain control flow and all
+ IR should be suitable for inlining straight into an entry block. In
+ the common case this is expected to contain only allocas. It is
+ expected to `fir.yield` the allocated value on all control paths.
+ If allocation is conditional (e.g. only allocate if the mold is
+ allocated), this should be done in the initilizer region and this
+ region not included. The alloc region is not used for by-value
+ reductions (where allocation is implicit).
+ 2. The initializer region specifies how to initialize the thread-local
+ reduction value. This is usually the neutral element of the reduction.
+ For convenience, the region has an argument that contains the value
+ of the reduction accumulator at the start of the reduction. If an alloc
+ region is specified, there is a second block argument containing the
+ address of the allocated memory. The initializer region is expected to
+ `fir.yield` the new value on all control flow paths.
+ 3. The reduction region specifies how to combine two values into one, i.e.
+ the reduction operator. It accepts the two values as arguments and is
+ expected to `fir.yield` the combined value on all control flow paths.
+ 4. The atomic reduction region is optional and specifies how two values
+ can be combined atomically given local accumulator variables. It is
+ expected to store the combined value in the first accumulator variable.
+ 5. The cleanup region is optional and specifies how to clean up any memory
+ allocated by the initializer region. The region has an argument that
+ contains the value of the thread-local reduction accumulator. This will
+ be executed after the reduction has completed.
+
+ Note that the MLIR type system does not allow for type-polymorphic
+ reductions. Separate reduction declarations should be created for different
+ element and accumulator types.
+
+ For initializer and reduction regions, the operand to `fir.yield` must
+ match the parent operation's results.
+ }];
+
+ let arguments = (ins SymbolNameAttr:$sym_name,
+ TypeAttr:$type);
+
+ let regions = (region MaxSizedRegion<1>:$allocRegion,
+ AnyRegion:$initializerRegion,
+ AnyRegion:$reductionRegion,
+ AnyRegion:$atomicReductionRegion,
+ AnyRegion:$cleanupRegion);
+
+ let assemblyFormat = "$sym_name `:` $type attr-dict-with-keyword "
+ "( `alloc` $allocRegion^ )? "
+ "`init` $initializerRegion "
+ "`combiner` $reductionRegion "
+ "( `atomic` $atomicReductionRegion^ )? "
+ "( `cleanup` $cleanupRegion^ )? ";
+
+ let extraClassDeclaration = [{
+ mlir::BlockArgument getAllocMoldArg() {
+ auto ®ion = getAllocRegion();
+ return region.empty() ? nullptr : region.getArgument(0);
+ }
+ mlir::BlockArgument getInitializerMoldArg() {
+ return getInitializerRegion().getArgument(0);
+ }
+ mlir::BlockArgument getInitializerAllocArg() {
+ return getAllocRegion().empty() ?
+ nullptr : getInitializerRegion().getArgument(1);
+ }
+ mlir::BlockArgument getReductionLhsArg() {
+ return getReductionRegion().getArgument(0);
+ }
+ mlir::BlockArgument getReductionRhsArg() {
+ return getReductionRegion().getArgument(1);
+ }
+ mlir::BlockArgument getAtomicReductionLhsArg() {
+ auto ®ion = getAtomicReductionRegion();
+ return region.empty() ? nullptr : region.getArgument(0);
+ }
+ mlir::BlockArgument getAtomicReductionRhsArg() {
+ auto ®ion = getAtomicReductionRegion();
+ return region.empty() ? nullptr : region.getArgument(1);
+ }
+ mlir::BlockArgument getCleanupAllocArg() {
+ auto ®ion = getCleanupRegion();
+ return region.empty() ? nullptr : region.getArgument(0);
+ }
+ }];
+
+ let hasRegionVerifier = 1;
+}
+
def fir_DoConcurrentOp : fir_Op<"do_concurrent",
[SingleBlock, AutomaticAllocationScope]> {
let summary = "do concurrent loop wrapper";
@@ -3700,6 +3797,25 @@ def fir_LocalSpecifier {
);
}
+def fir_ReduceSpecifier {
+ dag arguments = (ins
+ Variadic<AnyType>:$reduce_vars,
+ OptionalAttr<DenseBoolArrayAttr>:$reduce_byref,
+
+ // This introduces redundency in how reductions are modelled. In particular,
+ // a single reduction is represented by 2 attributes:
+ //
+ // 1. `$reduce_syms` which is a list of `DeclareReductionOp`s.
+ // 2. `$reduce_attrs` which is an array of `fir::ReduceAttr` values.
+ //
+ // The first makes it easier to map `do concurrent` to parallization models
+ // (e.g. OpenMP and OpenACC) while the second makes it easier to map it to
+ // nests of `fir.do_loop ... unodered` ops.
+ OptionalAttr<SymbolRefArrayAttr>:$reduce_syms,
+ OptionalAttr<ArrayAttr>:$reduce_attrs
+ );
+}
+
def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
[AttrSizedOperandSegments, DeclareOpInterfaceMethods<LoopLikeOpInterface,
["getLoopInductionVars"]>,
@@ -3709,7 +3825,7 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
let description = [{
An operation that models a Fortran `do concurrent` loop's header and block.
This is a single-region single-block terminator op that is expected to
- terminate the region of a `omp.do_concurrent` wrapper op.
+ terminate the region of a `fir.do_concurrent` wrapper op.
This op borrows from both `scf.parallel` and `fir.do_loop` ops. Similar to
`scf.parallel`, a loop nest takes 3 groups of SSA values as operands that
@@ -3747,8 +3863,6 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
- `lowerBound`: The group of SSA values for the nest's lower bounds.
- `upperBound`: The group of SSA values for the nest's upper bounds.
- `step`: The group of SSA values for the nest's steps.
- - `reduceOperands`: The reduction SSA values, if any.
- - `reduceAttrs`: Attributes to store reduction operations, if any.
- `loopAnnotation`: Loop metadata to be passed down the compiler pipeline to
LLVM.
}];
@@ -3757,12 +3871,12 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
Variadic<Index>:$lowerBound,
Variadic<Index>:$upperBound,
Variadic<Index>:$step,
- Variadic<AnyType>:$reduceOperands,
- OptionalAttr<ArrayAttr>:$reduceAttrs,
OptionalAttr<LoopAnnotationAttr>:$loopAnnotation
);
- let arguments = !con(opArgs, fir_LocalSpecifier.arguments);
+ let arguments = !con(opArgs,
+ fir_LocalSpecifier.arguments,
+ fir_ReduceSpecifier.arguments);
let regions = (region SizedRegion<1>:$region);
@@ -3783,12 +3897,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
getNumLocalOperands());
}
+ mlir::Block::BlockArgListType getRegionReduceArgs() {
+ return getBody()->getArguments().slice(getNumInductionVars()
+ + getNumLocalOperands(),
+ getNumReduceOperands());
+ }
+
/// Number of operands controlling the loop
unsigned getNumControlOperands() { return getLowerBound().size() * 3; }
// Get Number of reduction operands
unsigned getNumReduceOperands() {
- return getReduceOperands().size();
+ return getReduceVars().size();
}
mlir::Operation::operand_range getLocalOperands() {
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 8506b9a984e58..7e8f09a293834 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -12,6 +12,7 @@
#include "flang/Lower/Bridge.h"
+#include "OpenMP/ReductionProcessor.h"
#include "flang/Lower/Allocatable.h"
#include "flang/Lower/CallInterface.h"
#include "flang/Lower/Coarray.h"
@@ -127,9 +128,8 @@ struct IncrementLoopInfo {
bool isConcurrent;
llvm::SmallVector<const Fortran::semantics::Symbol *> localSymList;
llvm::SmallVector<const Fortran::semantics::Symbol *> localInitSymList;
- llvm::SmallVector<
- std::pair<fir::ReduceOperationEnum, const Fortran::semantics::Symbol *>>
- reduceSymList;
+ llvm::SmallVector<const Fortran::semantics::Symbol *> reduceSymList;
+ llvm::SmallVector<fir::ReduceOperationEnum> reduceOperatorList;
llvm::SmallVector<const Fortran::semantics::Symbol *> sharedSymList;
mlir::Value loopVariable = nullptr;
@@ -1981,7 +1981,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
case Fortran::parser::ReductionOperator::Operator::Ior:
return fir::ReduceOperationEnum::IOR;
case Fortran::parser::ReductionOperator::Operator::Ieor:
- return fir::ReduceOperationEnum::EIOR;
+ return fir::ReduceOperationEnum::IEOR;
}
llvm_unreachable("illegal reduction operator");
}
@@ -2015,8 +2015,8 @@ class FirConverter : public Fortran::lower::AbstractConverter {
std::get<Fortran::parser::ReductionOperator>(reduceList->t));
for (const Fortran::parser::Name &x :
std::get<std::list<Fortran::parser::Name>>(reduceList->t)) {
- info.reduceSymList.push_back(
- std::make_pair(reduce_operation, x.symbol));
+ info.reduceSymList.push_back(x.symbol);
+ info.reduceOperatorList.push_back(reduce_operation);
}
}
}
@@ -2077,6 +2077,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
assign.u = Fortran::evaluate::Assignment::BoundsSpec{};
genAssignment(assign);
}
+
for (const Fortran::semantics::Symbol *sym : info.sharedSymList) {
const auto *hostDetails =
sym->detailsIf<Fortran::semantics::HostAssocDetails>();
@@ -2100,6 +2101,46 @@ class FirConverter : public Fortran::lower::AbstractConverter {
}
}
+ llvm::SmallVector<bool> reduceVarByRef;
+ llvm::SmallVector<mlir::Attribute> reductionDeclSymbols;
+ llvm::SmallVector<mlir::Attribute> nestReduceAttrs;
+
+ for (const auto &reduceOp : info.reduceOperatorList) {
+ nestReduceAttrs.push_back(
+ fir::ReduceAttr::get(builder->getContext(), reduceOp));
+ }
+
+ llvm::SmallVector<mlir::Value> reduceVars;
+ Fortran::lower::omp::ReductionProcessor rp;
+ rp.processReductionArguments<fir::DeclareReductionOp>(
+ toLocation(), *this, info.reduceOperatorList, reduceVars,
+ reduceVarByRef, reductionDeclSymbols, info.reduceSymList);
+
+ doConcurrentLoopOp.getReduceVarsMutable().assign(reduceVars);
+ doConcurrentLoopOp.setReduceSymsAttr(
+ reductionDeclSymbols.empty()
+ ? nullptr
+ : mlir::ArrayAttr::get(builder->getContext(),
+ reductionDeclSymbols));
+ doConcurrentLoopOp.setReduceAttrsAttr(
+ nestReduceAttrs.empty()
+ ? nullptr
+ : mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs));
+ doConcurrentLoopOp.setReduceByrefAttr(
+ reduceVarByRef.empty() ? nullptr
+ : mlir::DenseBoolArrayAttr::get(
+ builder->getContext(), reduceVarByRef));
+
+ for (auto [sym, reduceVar] :
+ llvm::zip_equal(info.reduceSymList, reduceVars)) {
+ auto arg = doConcurrentLoopOp.getRegion().begin()->addArgument(
+ reduceVar.getType(), doConcurrentLoopOp.getLoc());
+ bindSymbol(*sym, hlfir::translateToExtendedValue(
+ reduceVar.getLoc(), *builder, hlfir::Entity{arg},
+ /*contiguousHint=*/true)
+ .first);
+ }
+
// Note that allocatable, types with ultimate components, and type
// requiring finalization are forbidden in LOCAL/LOCAL_INIT (F2023 C1130),
// so no clean-up needs to be generated for these entities.
@@ -2191,6 +2232,12 @@ class FirConverter : public Fortran::lower::AbstractConverter {
}
}
+ // Introduce a `do concurrent` scope to bind symbols corresponding to local,
+ // local_init, and reduce region arguments.
+ if (!incrementLoopNestInfo.empty() &&
+ incrementLoopNestInfo.back().isConcurrent)
+ localSymbols.pushScope();
+
// Increment loop begin code. (Infinite/while code was already generated.)
if (!infiniteLoop && !whileCondition)
genFIRIncrementLoopBegin(incrementLoopNestInfo, doStmtEval.dirs);
@@ -2214,6 +2261,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
// This call may generate a branch in some contexts.
genFIR(endDoEval, unstructuredContext);
+
+ if (!incrementLoopNestInfo.empty() &&
+ incrementLoopNestInfo.back().isConcurrent)
+ localSymbols.popScope();
}
/// Generate FIR to evaluate loop control values (lower, upper and step).
@@ -2396,19 +2447,6 @@ class FirConverter : public Fortran::lower::AbstractConverter {
info.stepVariable = builder->createTemporary(loc, stepValue.getType());
builder->create<fir::StoreOp>(loc, stepValue, info.stepVariable);
}
-
- if (genDoConcurrent && nestReduceOperands.empty()) {
- // Create DO CONCURRENT reduce operands and attributes
- for (const auto &reduceSym : info.reduceSymList) {
- const fir::ReduceOperationEnum reduceOperation = reduceSym.first;
- const Fortran::semantics::Symbol *sym = reduceSym.second;
- fir::ExtendedValue exv = getSymbolExtendedValue(*sym, nullptr);
- nestReduceOperands.push_back(fir::getBase(exv));
- auto reduceAttr =
- fir::ReduceAttr::get(builder->getContext(), reduceOperation);
- nestReduceAttrs.push_back(reduceAttr);
- }
- }
}
for (auto [info, lowerValue, upperValue, stepValue] :
@@ -2506,11 +2544,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
builder->setInsertionPointToEnd(loopWrapperOp.getBody());
auto loopOp = builder->create<fir::DoConcurrentLoopOp>(
- loc, nestLBs, nestUBs, nestSts, nestReduceOperands,
- nestReduceAttrs.empty()
- ? nullptr
- : mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs),
- nullptr, /*local_vars=*/std::nullopt, /*local_syms=*/nullptr);
+ loc, nestLBs, nestUBs, nestSts, nullptr, /*local_vars=*/std::nullopt,
+ /*local_syms=*/nullptr, /*reduce_vars=*/std::nullopt,
+ /*reduce_byref=*/nullptr, /*reduce_syms=*/nullptr,
+ /*reduce_attrs=*/nullptr);
llvm::SmallVector<mlir::Type> loopBlockArgTypes(
incrementLoopNestInfo.size(), builder->getIndexType());
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index 7bea427099a28..5aebfc901e8ac 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -12,6 +12,7 @@
#include "ClauseProcessor.h"
#include "Clauses.h"
+#include "ReductionProcessor.h"
#include "Utils.h"
#include "flang/Lower/ConvertExprToHLFIR.h"
@@ -25,6 +26,21 @@ namespace Fortran {
namespace lower {
namespace omp {
+using ReductionModifier =
+ Fortran::lower::omp::clause::Reduction::ReductionModifier;
+
+mlir::omp::ReductionModifier translateReductionModifier(ReductionModifier mod) {
+ switch (mod) {
+ case ReductionModifier::Default:
+ return mlir::omp::ReductionModifier::defaultmod;
+ case ReductionModifier::Inscan:
+ return mlir::omp::ReductionModifier::inscan;
+ case ReductionModifier::Task:
+ return mlir::omp::ReductionModifier::task;
+ }
+ return mlir::omp::ReductionModifier::defaultmod;
+}
+
/// Check for unsupported map operand types.
static void checkMapType(mlir::Location location, mlir::Type type) {
if (auto refType = mlir::dyn_cast<fir::ReferenceType>(type))
@@ -1076,6 +1092,18 @@ bool ClauseProcessor::processIf(
});
return found;
}
+
+template <typename T>
+void collectReductionSyms(
+ const T &reduction,
+ llvm::SmallVectorImpl<const semantics::Symbol *> &reductionSyms) {
+ const auto &objectList{std::get<omp::ObjectList>(reduction.t)};
+ for (const Object &object : objectList) {
+ const semantics::Symbol *symbol = object.sym();
+ reductionSyms.push_back(symbol);
+ }
+}
+
bool ClauseProcessor::processInReduction(
mlir::Location currentLocation, mlir::omp::InReductionClauseOps &result,
llvm::SmallVectorImpl<const semantics::Symbol *> &outReductionSyms) const {
@@ -1085,10 +1113,14 @@ bool ClauseProcessor::processInReduction(
llvm::SmallVector<bool> inReduceVarByRef;
llvm::SmallVector<mlir::Attribute> inReductionDeclSymbols;
llvm::SmallVector<const semantics::Symbol *> inReductionSyms;
+ collectReductionSyms(clause, inReductionSyms);
+
ReductionProcessor rp;
- rp.processReductionArguments<omp::clause::InReduction>(
- currentLocation, converter, clause, inReductionVars,
- inReduceVarByRef, inReductionDeclSymbols, inReductionSyms);
+ rp.processReductionArguments<mlir::omp::DeclareReductionOp>(
+ currentLocation, converter,
+ std::get<typename omp::clause::ReductionOperatorList>(clause.t),
+ inReductionVars, inReduceVarByRef, inReductionDeclSymbols,
+ inReductionSyms);
// Copy local lists into the output.
llvm::copy(inReductionVars, std::back_inserter(result.inReductionVars));
@@ -1416,10 +1448,23 @@ bool ClauseProcessor::processReduction(
llvm::SmallVector<bool> reduceVarByRef;
llvm::SmallVector<mlir::Attribute> reductionDeclSymbols;
llvm::SmallVector<const semantics::Symbol *> reductionSyms;
+ collectReductionSyms(clause, reductionSyms);
+
+ auto mod = std::get<std::optional<ReductionModifier>>(clause.t);
+ if (mod.has_value()) {
+ if (mod.value() == ReductionModifier::Task)
+ TODO(currentLocation, "Reduction modifier `task` is not supported");
+ else
+ res...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
let summary = "declares a reduction kind"; | ||
let description = [{ | ||
Note: this operation is adapted from omp::DeclareReductionOp. There is a lot | ||
duplication at the moment. TODO Combined both ops into one. See: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
duplication at the moment. TODO Combined both ops into one. See: | |
duplication at the moment. TODO Combine both ops into one. See: |
1. The optional alloc region specifies how to allocate the thread-local | ||
reduction value. This region should not contain control flow and all | ||
IR should be suitable for inlining straight into an entry block. In | ||
the common case this is expected to contain only allocas. It is | ||
expected to `fir.yield` the allocated value on all control paths. | ||
If allocation is conditional (e.g. only allocate if the mold is | ||
allocated), this should be done in the initilizer region and this | ||
region not included. The alloc region is not used for by-value | ||
reductions (where allocation is implicit). | ||
2. The initializer region specifies how to initialize the thread-local | ||
reduction value. This is usually the neutral element of the reduction. | ||
For convenience, the region has an argument that contains the value | ||
of the reduction accumulator at the start of the reduction. If an alloc | ||
region is specified, there is a second block argument containing the | ||
address of the allocated memory. The initializer region is expected to | ||
`fir.yield` the new value on all control flow paths. | ||
3. The reduction region specifies how to combine two values into one, i.e. | ||
the reduction operator. It accepts the two values as arguments and is | ||
expected to `fir.yield` the combined value on all control flow paths. | ||
4. The atomic reduction region is optional and specifies how two values | ||
can be combined atomically given local accumulator variables. It is | ||
expected to store the combined value in the first accumulator variable. | ||
5. The cleanup region is optional and specifies how to clean up any memory | ||
allocated by the initializer region. The region has an argument that | ||
contains the value of the thread-local reduction accumulator. This will | ||
be executed after the reduction has completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the OpenMP dialect I had to write these to be very vague to keep it not flang dependent. In the FIR dialect we can explain this more clearly by referring to boxes, ALLOCATABLE, etc.
A rough example
- Alloc region allocas the box and does nothing else
- init region allocates memory for the box to point to, sets the metadata fields, initializes the memory with the neutral value for the reduction
The whole design is a bit convoluted by the presence of by-value reductions. These were originally kept in case there was a performance penalty to by-ref for trivial types, but we haven't seen that penalty show up for privatization so this adds needless complexity. But I understand this is already a big PR and you may not want to change more things at once.
? nullptr | ||
: mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs), | ||
nullptr, /*local_vars=*/std::nullopt, /*local_syms=*/nullptr); | ||
loc, nestLBs, nestUBs, nestSts, nullptr, /*local_vars=*/std::nullopt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loc, nestLBs, nestUBs, nestSts, nullptr, /*local_vars=*/std::nullopt, | |
loc, nestLBs, nestUBs, nestSts, /*reduceOperands=*/nullptr, /*local_vars=*/std::nullopt, |
for (const auto &reduceOp : info.reduceOperatorList) { | ||
nestReduceAttrs.push_back( | ||
fir::ReduceAttr::get(builder->getContext(), reduceOp)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
braces
duplication at the moment. TODO Combined both ops into one. See: | ||
https://discourse.llvm.org/t/dialect-for-data-locality-sharing-specifiers-clauses-in-openmp-openacc-and-do-concurrent/86108. | ||
|
||
Declares an `do concurrent` reduction. This requires two mandatory and three |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declares an `do concurrent` reduction. This requires two mandatory and three | |
Declares a `do concurrent` reduction. This requires two mandatory and three |
?
This PR proposes re-modelling
reduce
specifiers to match OpenMP and OpenACC. In particular, this PR includes the following:fir
op:fir.delcare_reduction
which is identical to OpenMP'somp.declare_reduction
op.reduce
clause onfir.do_concurrent.loop
to use the new op.ReductionProcessor
component to emit reductions fordo concurrent
just like we do for OpenMP. To do this, theReductionProcessor
had to be refactored to be more generalized.do concurrent
tofir.loop ... unordered
nests using the new reduction model.Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the
fir
table-gen changes todo concurrent
. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things.This PR goes in the same direction we went for
private/local
speicifiers. Now thedo concurrent
and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully.