[mlir-hlo] Added BufferReuse Optimization. #48883

ghost · 2021-05-03T09:59:41Z

In this PR, we want to introduce a new optimization on reusing already allocated buffers to save memory consumption. The optimization consists of two steps. First, for each buffer, we find a list of buffers that are potential reuses. A possible reuse has the following properties:

the types are compatible
no interference in the UserangeAnalysis ([mlir-hlo] Added Userange Analysis for Buffers. #48847)
the dominance is still given

The second step is a fixpoint iteration over the potential reuses. This step is divided into two substeps:

try to assign possible reuses for each buffer
update the potential reuses based on the assignments from step 1.
After the distribution of all possible reusable buffers is done, they are actually replaced.

For example:                        Result:
func @simpleReuse(%arg0: i1) {      func @simpleReuse(%arg0: i1) {
  %0 = alloc()                        %0 = alloc()
  %1 = alloc()
  cond_br %arg0, ^bb1, ^bb2           cond_br %arg0, ^bb1, ^bb2
 ^bb1:                               ^bb1:
  use(%0)                             use(%0)
  br ^bb3                             br ^bb3
 ^bb2:                               ^bb2:
  use(%1)                             use(%0)
  br ^bb3                             br ^bb3
 ^bb3:                               ^bb3:
  return                              return
}                                   }

In this simple example %1 can be replaced with %0, because all requirements mentioned above are fulfilled.

This PR is a follow up to #48847, in which we introduced the UserangeAnalysis.

sanjoy

Should this be split into multiple PRs, one with just the Userange analysis and one with the buffer_reuse optimization?

sanjoy · 2021-05-04T02:30:53Z

tensorflow/compiler/mlir/hlo/include/mlir-hlo/Transforms/passes.td

+
+def BufferReuse : FunctionPass<"buffer-reuse"> {
+  let summary = "Reuses already allocated buffers to save allocation "
+                "operations if all criteria are met.";


Maybe just say "is provably safe" instead of "if all criteria are met"? IMO that's a bit more exact.

sanjoy · 2021-05-04T02:33:50Z

tensorflow/compiler/mlir/hlo/include/mlir-hlo/Analysis/userange_analysis.h

+/// every alloc value.
+class UserangeAnalysis {
+public:
+  using UseInterval = std::pair<size_t, size_t>;


Maybe add a brief comment on what the size_ts in UseInterval mean?

sanjoy · 2021-05-04T03:05:07Z

tensorflow/compiler/mlir/hlo/include/mlir-hlo/Analysis/userange_analysis.h

+  /// Computes the ID for the operation. If the operation contains operands
+  /// which have read effects, the returning ID will be odd.


Does it make sense to have a richer encoding here instead of implicitly encoding information in the last bit?

Yes, you are right. This part is still work in progress. I will change the PR of the UserangeAnalysis to a draft and work on this issue. Also I tried to split this PR into 2 parts. Part 1 can be found here: #48847.

sanjoy · 2021-05-04T04:15:27Z

tensorflow/compiler/mlir/hlo/lib/Analysis/userange_analysis.cc

+  /// Constructs an Userange builder.
+  UserangeInfoBuilder(Liveness liveness, ValueSetT values,
+                      OperationListT opList)
+      : values(values), opList(opList), liveness(liveness) {}


Maybe std::move these in?

sanjoy · 2021-05-04T15:01:21Z

tensorflow/compiler/mlir/hlo/lib/Analysis/userange_analysis.cc

+  for (auto iterA = a.begin(), endA = a.end();
+       iterA != endA && iterB != endB;) {
+    // iterA is strictly before iterB => increment iterA.
+    if (iterA->second < iterB->first) ++iterA;


Might be better to encapsulate the UserangeIntervals in a class of its own and providing methods like strictlyBefore?

sanjoy · 2021-05-04T15:01:29Z

tensorflow/compiler/mlir/hlo/lib/Analysis/userange_analysis.cc

+    // Usually, we would expect the case of iterB beeing strictly before iterA.
+    // However, due to the initial assumption that all intervals of b are
+    // included in some interval of a, we do not need to check if iterB is
+    // striclty before iterA.


sanjoy · 2021-05-04T15:23:44Z

tensorflow/compiler/mlir/hlo/lib/Analysis/userange_analysis.cc

+  for (auto iterA = a.begin(), endA = a.end();
+       iterA != endA && iterB != endB;) {
+    // iterA is strictly before iterB => increment iterA.
+    if (iterA->second < iterB->first) ++iterA;


What if iterA reaches the end after this increment? Or is that logically impossible?

This is not possible. The prerequisite is that B is a "proper subset" of A. If iterA is on the last element and IntervalVector b is not subtracted yet, then iterA cannot be before iterB.

sanjoy · 2021-05-04T15:24:42Z

tensorflow/compiler/mlir/hlo/lib/Analysis/userange_analysis.cc

+    // iterB is at the start of iterA, but iterA has some values that go
+    // beyond those of iterB. We have to set the lower bound of iterA to the
+    // upper bound of iterB + 1 and increment iterB.
+    // A(3, 100) - B(3, 5) => A(6,100)


So these intervals are end-inclusive?

sanjoy · 2021-05-04T15:27:28Z

tensorflow/compiler/mlir/hlo/lib/Analysis/userange_analysis.cc

+    }
+    // iterB is in the middle of iterA. We have to split iterA and increment
+    // iterB.
+    // A(2, 10) B(5, 7) => (2, 4), (8, 10)


You're missing a - I think.

sanjoy · 2021-05-04T15:27:54Z

tensorflow/compiler/mlir/hlo/lib/Analysis/userange_analysis.cc

+                                        const IntervalVector& b) const {
+  auto iterB = b.begin();
+  auto endB = b.end();
+  for (auto iterA = a.begin(), endA = a.end();


Can some of the logic in the loop be factored into an operator-= function on UserangeInterval?

gbaned · 2021-05-06T14:54:57Z

@dfki-albo Can you please check @sanjoy's comments and keep us posted ? Thanks!

sherhut · 2021-06-18T11:12:11Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+    Operation* defOpA = a.getDefiningOp();
+    Operation* defOpB = b.getDefiningOp();
+
+    // If the alloc method or the number of operands is not the same the types


nit: Cannot be -> might not be.

sherhut · 2021-06-18T11:12:45Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+
+  /// Checks if the types of the given values are compatible for a
+  /// replacement.
+  bool checkTypeCompatibility(Value a, Value b) {


Don't call this type. It is based on more than types. Maybe just checkReuseCompatibility?

sherhut · 2021-06-18T11:14:41Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+        if (itemA == itemB || !checkTypeCompatibility(itemA, itemB)) continue;
+
+        // Check if itemA can replace itemB.
+        if (!userange.rangesInterfere(itemA, itemB)) continue;


Why is this negated? If the useranges interfere, there can be no reuse, right?

Yes, you are right. I'll add this to the comment and also move the negation into the method, because otherwise the name is misleading.

sherhut · 2021-06-18T11:19:04Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+          }
+          ++it;
+        }
+        if (it == potReuseVector.end()) potReuseVector.push_back(itemB);


Why not first search the insertion point it and then always insert using insert, even if it points to the end?

sherhut · 2021-06-18T11:19:31Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+    // Create a list of values that can potentially be replaced for each value
+    // in the useRangeMap. The potentialReuseMap maps each value to the
+    // respective list.
+    llvm::MapVector<Value, SmallVector<Value, 4>> potentialReuseMap;


nit: this function is very long. Consider splitting into smaller helpers.

sherhut · 2021-06-18T11:26:31Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+
+        // Iterate over the potential reuses and check if they can still be
+        // reused.
+        for (Value* potReuseValue = potReuses->begin();


Why a pointer? Just use Value

Maybe use std::remove_if?

sherhut · 2021-06-18T11:29:47Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+          if (replacedSet.contains(*potReuseValue) ||
+              transitiveInterference(*potReuseValue, potReuses,
+                                     actualReuseMap) ||
+              !userange.rangesInterfere(item, *potReuseValue))


I am always confused why this is negated. If the reanges interfere, then no reuse is possible, no?

sherhut · 2021-06-18T12:26:39Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+      for (auto itReuseMap = potentialReuseMap.begin();
+           itReuseMap != potentialReuseMap.end();) {
+        Value item = itReuseMap->first;
+        SmallVector<Value, 4>* potReuses = &itReuseMap->second;


How about using a reference?

sherhut · 2021-06-18T12:28:47Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+        // The defining block of itemA has to dominate all uses of itemB.
+        if (!dominatesAllUses(defOpBlock, itemB)) continue;
+
+        // Insert itemB into the right place of the potReuseVector. The order of


It would be interesting to explore other orders here. This is essentially greedy but one could try prioritize reuses for certain cases. Like if one has a copy, then source and destination should preferably be reused.

sherhut · 2021-06-18T12:29:40Z

tensorflow/compiler/mlir/hlo/tools/mlir-hlo-opt/mlir-hlo-opt.cpp

@@ -26,6 +27,7 @@ int main(int argc, char **argv) {
  mlir::registerAllPasses();
  mlir::mhlo::registerAllMhloPasses();
  mlir::lmhlo::registerAllLmhloPasses();
+  mlir::registerAllTransformPasses();



Why are these needed here and everywhere? Was that for debugging?

gbaned · 2021-06-25T17:44:49Z

@dfki-albo Can you please resolve conflicts? Thanks!

sherhut · 2021-07-07T12:01:27Z

tensorflow/compiler/mlir/hlo/include/mlir-hlo/Transforms/passes.td

@@ -18,6 +18,17 @@ limitations under the License.

 include "mlir/Pass/PassBase.td"

+def BufferReuse : FunctionPass<"buffer-reuse"> {
+  let summary = "Reuses already allocated buffers to save allocation "
+                "operations if is provably safe.";


nit: if it is

sherhut · 2021-07-07T12:03:09Z

tensorflow/compiler/mlir/hlo/lib/Analysis/test_userange_analysis.cc

@@ -30,6 +30,11 @@ struct TestUserangePass : public TestUserangeBase<TestUserangePass> {
    registry.insert<mlir::lmhlo::LmhloDialect>();
  }

+  StringRef getArgument() const final { return "test-print-userange"; }


Why is this needed?

sherhut · 2021-07-07T12:06:49Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+          continue;
+
+        // Get the defining block of itemA.
+        Block *defOpBlock = itemA.isa<BlockArgument>()


You can always use getParentBlock() here. It returns the block of the defining op, as well.

Also, itemA is an alloc, so it cannot be a block argument, right?

sherhut · 2021-07-07T12:09:29Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+        potReuseVector.insert(insertionPoint, itemB);
+      }
+
+      potentialReuseMap.insert(


Do you need the full type here or would {itemA, potReuseVector} suffice?

sherhut · 2021-07-07T12:13:44Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+      Value potReuseValue, SmallVector<Value, 4> &potReuses,
+      llvm::MapVector<Value, DenseSet<Value>> &actualReuseMap) {
+    return actualReuseMap.find(potReuseValue) != actualReuseMap.end() &&
+           llvm::any_of(actualReuseMap[potReuseValue], [&](Value vReuse) {


Nit: This does a double lookup. If you already use find, you can use the result of the find to access the found value.

sherhut · 2021-07-07T12:15:28Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+      llvm::MapVector<Value, DenseSet<Value>> &actualReuseMap) {
+    return actualReuseMap.find(potReuseValue) != actualReuseMap.end() &&
+           llvm::any_of(actualReuseMap[potReuseValue], [&](Value vReuse) {
+             return !std::count(potReuses.begin(), potReuses.end(), vReuse);


Nit: Use find here, as that stops after first occurrence.

sherhut · 2021-07-07T12:19:30Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+      return false;
+
+    // If all operands are equal the types are compatible.
+    for (auto const &pair :


This is not true. Consider memref<?x5xi32> vs. memref<5x?xi32. Also, you need to consider the basetype here, too.

As an extension, this could also work on the size of the element type. Not in this PR, please leave a TODO for this.

I was a bit imprecise. Please fix the issue with partially static types and also consider the basetype.

An extension to support reuse for elementtypes of the same size can be a TODO.

Use std::equal?

sherhut · 2021-07-07T12:19:59Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+  /// A Fixpoint iteration over the potential reuses to compute the actual
+  /// reuses.
+  llvm::MapVector<Value, DenseSet<Value>> computeActualReuse(
+      llvm::MapVector<Value, SmallVector<Value, 4>> &potentialReuseMap) {


Having read this type so often, maybe give it a name?

sherhut · 2021-07-07T12:25:20Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+      llvm::MapVector<Value, DenseSet<Value>> &actualReuseMap) {
+    for (auto &potReuser : potentialReuseMap) {
+      Value item = potReuser.first;
+      SmallVector<Value, 4> potReuses = potReuser.second;


nit: this copies the vector.

sherhut · 2021-07-08T14:54:03Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+      return false;
+
+    // If all operands are equal the types are compatible.
+    for (auto const &pair :


I was a bit imprecise. Please fix the issue with partially static types and also consider the basetype.

An extension to support reuse for elementtypes of the same size can be a TODO.

sherhut

Why is the code formatted differently in this PR? Is that an accidental change?

Please fix the comment and use std::equal. I will run presubmits on this change anyway, though.

sherhut · 2021-07-12T16:08:19Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+        defOpA->getNumOperands() != defOpB->getNumOperands())
+      return false;
+
+    // TODO: Fix for memref<?x5xi32> vs memref<5x?xi32>, also consider the


This is now fixed.

sherhut · 2021-07-12T16:08:33Z

tensorflow/compiler/mlir/hlo/lib/Transforms/buffer_reuse.cc

+      return false;
+
+    // If all operands are equal the types are compatible.
+    for (auto const &pair :


Use std::equal?

sherhut · 2021-07-12T17:56:54Z

tensorflow/compiler/mlir/hlo/BUILD

+    deps = [
+        ":hlo",
+        ":transforms_pass_inc_gen",
+        "@llvm-project//mlir:MemRefDialect",


These are ordered incorrect. See the builder log.

Imported from GitHub PR tensorflow/tensorflow#48883 In this PR, we want to introduce a new optimization on reusing already allocated buffers to save memory consumption. The optimization consists of two steps. First, for each buffer, we find a list of buffers that are potential reuses. A possible reuse has the following properties: - the types are compatible - no interference in the `UserangeAnalysis` (#48847) - the dominance is still given The second step is a fixpoint iteration over the potential reuses. This step is divided into two substeps: - try to assign possible reuses for each buffer - update the potential reuses based on the assignments from step 1. After the distribution of all possible reusable buffers is done, they are actually replaced. ``` For example: Result: func @simpleReuse(%arg0: i1) { func @simpleReuse(%arg0: i1) { %0 = alloc() %0 = alloc() %1 = alloc() cond_br %arg0, ^bb1, ^bb2 cond_br %arg0, ^bb1, ^bb2 ^bb1: ^bb1: use(%0) use(%0) br ^bb3 br ^bb3 ^bb2: ^bb2: use(%1) use(%0) br ^bb3 br ^bb3 ^bb3: ^bb3: return return } } ``` In this simple example `%1` can be replaced with `%0`, because all requirements mentioned above are fulfilled. This PR is a follow up to #48847, in which we introduced the `UserangeAnalysis`. Copybara import of the project: -- ba20a66f43af9a4a6d2639116de768dd050016a8 by Alexander Bosch <Alexander.Bosch@dfki.de>: PR #48847 -- d7b1461fa31c2e847d6eaf087cefd30a8fcdb90b by Alexander Bosch <Alexander.Bosch@dfki.de>: Implementation of a BufferReuse Optimization. -- 5dfaf703438c4260b0edc0334f0b0428b47cca5c by Alexander Bosch <Alexander.Bosch@dfki.de>: Addressed reviewers comments. -- 299f161e3c8630b577fb529b1662f8c9f6395fca by Alexander Bosch <Alexander.Bosch@dfki.de>: Addressed reviewer comments. -- bc43c5670dd1b52441c8d11e7863aad3b8a01c85 by Alexander Bosch <Alexander.Bosch@dfki.de>: Fixed false pass registration. -- d28e1964a13b0cd8f60c4586e0e93c1d670b8f3d by Alexander Bosch <Alexander.Bosch@dfki.de>: Resolved conflicts. -- 51ebd09c55b8da21421646a6c10a23995cbd0d2a by Alexander Bosch <Alexander.Bosch@dfki.de>: Rebased with underlying PR #48847. -- 5ee27a13cda451e0cf520951fc2e08450e4984ec by Alexander Bosch <Alexander.Bosch@dfki.de>: Addressed reviewers comments. -- 456af48684a1bf1d549e244340afc639e804051d by Alexander Bosch <Alexander.Bosch@dfki.de>: Fixed a bug in checkReuseCompatibility. -- cff3fa094a6928d24f36ed9432176287bd396800 by Alexander Bosch <Alexander.Bosch@dfki.de>: Fixed formatting issues and addressed reviewers comments. -- 19928064d51c70aca9f5031cc743ea3e39a20541 by Alexander Bosch <Alexander.Bosch@dfki.de>: Removed anti-pattern. PiperOrigin-RevId: 385111737

google-ml-butler bot added the size:XL CL Change Size:Extra Large label May 3, 2021

google-ml-butler bot requested review from joker-eph and sanjoy May 3, 2021 09:59

google-cla bot added the cla: yes label May 3, 2021

dfki-thsc mentioned this pull request May 3, 2021

[mlir-hlo] Added pass to transform view to alloc operations of the memref dialect. #48827

Closed

gbaned self-assigned this May 3, 2021

gbaned added this to Assigned Reviewer in PR Queue via automation May 3, 2021

gbaned requested a review from smit-hinsu May 3, 2021 13:53

sanjoy suggested changes May 4, 2021

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes May 4, 2021

smit-hinsu removed their request for review May 4, 2021 21:29

gbaned added the stat:awaiting response Status - Awaiting response from author label May 6, 2021

ghost mentioned this pull request May 7, 2021

[mlir-hlo] Added Userange Analysis for Buffers. #48847

Merged

ghost marked this pull request as draft May 7, 2021 10:58

sherhut reviewed Jun 18, 2021

View reviewed changes

ghost marked this pull request as ready for review June 23, 2021 09:52

sherhut suggested changes Jul 7, 2021

View reviewed changes

Alexander Bosch added 8 commits July 8, 2021 10:55

PR #48847

ba20a66

Implementation of a BufferReuse Optimization.

d7b1461

Addressed reviewers comments.

5dfaf70

Addressed reviewer comments.

299f161

Fixed false pass registration.

bc43c56

Resolved conflicts.

d28e196

Rebased with underlying PR #48847.

51ebd09

Addressed reviewers comments.

5ee27a1

gbaned requested a review from sherhut July 8, 2021 11:11

gbaned removed the stat:awaiting response Status - Awaiting response from author label Jul 8, 2021

sherhut suggested changes Jul 8, 2021

View reviewed changes

Fixed a bug in checkReuseCompatibility.

456af48

sherhut suggested changes Jul 12, 2021

View reviewed changes

sherhut added the kokoro:force-run Tests on submitted change label Jul 12, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 12, 2021

sherhut reviewed Jul 12, 2021

View reviewed changes

Alexander Bosch added 2 commits July 13, 2021 09:08

Fixed formatting issues and addressed reviewers comments.

cff3fa0

Removed anti-pattern.

1992806

sherhut approved these changes Jul 13, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jul 13, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 13, 2021

copybara-service bot merged commit 4226d27 into tensorflow:master Jul 16, 2021

PR Queue automation moved this from Reviewer Requested Changes to Merged Jul 16, 2021

		/// Computes the ID for the operation. If the operation contains operands
		/// which have read effects, the returning ID will be odd.

[mlir-hlo] Added BufferReuse Optimization. #48883

[mlir-hlo] Added BufferReuse Optimization. #48883

Conversation

ghost commented May 3, 2021 • edited by ghost

sanjoy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbaned commented May 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbaned commented Jun 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sherhut left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented May 3, 2021 •

edited by ghost