Add support for some multi-store cases in affine loop fusion #162

dcaballe · 2019-10-01T23:31:01Z

Hello!

We've been giving affine loop fusion a try and we are pretty happy with the initial results! Great work! We are seeing quite a few loop nests being fused in our models!

We noticed that currently only single-store producer loops are supported and would like to contribute a fix for that since it seems to be an important limitation. Even though our loop nests usually have a single store, this limitation is exposed when several single-store loop nests can be fused. The following example is a snippet of a model we are working on, where 4 loop nests could be fused into a single one:

func @main(%in : memref<1048576x256xf32>, %out : memref<1048576x256xf32>) {
  %cst = constant 0.000000e+00 : f32

  %6 = alloc() : memref<1048576x256xf32>
  affine.for %arg7 = 0 to 1048576 {
    affine.for %arg8 = 0 to 256 {
      %13 = affine.load %in[%arg7, %arg8] : memref<1048576x256xf32>
      %14 = affine.load %in[%arg7, %arg8] : memref<1048576x256xf32>
      %15 = mulf %14, %13 : f32
      affine.store %15, %6[%arg7, %arg8] : memref<1048576x256xf32>
    }
  }
  %7 = alloc() : memref<1048576x256xf32>
  affine.for %arg7 = 0 to 1048576 {
    affine.for %arg8 = 0 to 256 {
      %13 = affine.load %6[%arg7, %arg8] : memref<1048576x256xf32>
      %14 = cmpf "ogt", %13, %cst : f32
      %15 = select %14, %13, %cst : f32
      affine.store %15, %7[%arg7, %arg8] : memref<1048576x256xf32>
    }
  }
  %8 = alloc() : memref<1048576x256xf32>
  affine.for %arg7 = 0 to 1048576 {
    affine.for %arg8 = 0 to 256 {
      %13 = affine.load %7[%arg7, %arg8] : memref<1048576x256xf32>
      %14 = affine.load %7[%arg7, %arg8] : memref<1048576x256xf32>
      %15 = mulf %14, %13 : f32
      affine.store %15, %8[%arg7, %arg8] : memref<1048576x256xf32>
    }
  }
  affine.for %arg7 = 0 to 1048576 {
    affine.for %arg8 = 0 to 256 {
      %13 = affine.load %8[%arg7, %arg8] : memref<1048576x256xf32>
      %14 = cmpf "ogt", %13, %cst : f32
      %15 = select %14, %13, %cst : f32
      affine.store %15, %out[%arg7, %arg8] : memref<1048576x256xf32>
    }
  }
  dealloc %6 : memref<1048576x256xf32>
  dealloc %7 : memref<1048576x256xf32>
  dealloc %8 : memref<1048576x256xf32>
  return
}

However, affine loop fusion only fuses the first 3 loops and not the 4th one:

module {
  func @main(%arg0: memref<1048576x256xf32>, %arg1: memref<1048576x256xf32>) {
    %cst = constant 0.000000e+00 : f32
    %0 = alloc() : memref<1048576x256xf32>
    %1 = alloc() : memref<1048576x256xf32>
    %2 = alloc() : memref<1048576x256xf32>
    affine.for %arg2 = 0 to 1048576 {
      affine.for %arg3 = 0 to 256 {
        %3 = affine.load %arg0[%arg2, %arg3] : memref<1048576x256xf32>
        %4 = affine.load %arg0[%arg2, %arg3] : memref<1048576x256xf32>
        %5 = mulf %4, %3 : f32
        affine.store %5, %0[%arg2, %arg3] : memref<1048576x256xf32>
        %6 = affine.load %0[%arg2, %arg3] : memref<1048576x256xf32>
        %7 = cmpf "ogt", %6, %cst : f32
        %8 = select %7, %6, %cst : f32
        affine.store %8, %1[%arg2, %arg3] : memref<1048576x256xf32>
        %9 = affine.load %1[%arg2, %arg3] : memref<1048576x256xf32>
        %10 = affine.load %1[%arg2, %arg3] : memref<1048576x256xf32>
        %11 = mulf %10, %9 : f32
        affine.store %11, %2[%arg2, %arg3] : memref<1048576x256xf32>
      }
    }
    affine.for %arg2 = 0 to 1048576 {
      affine.for %arg3 = 0 to 256 {
        %3 = affine.load %2[%arg2, %arg3] : memref<1048576x256xf32>
        %4 = cmpf "ogt", %3, %cst : f32
        %5 = select %4, %3, %cst : f32
        affine.store %5, %arg1[%arg2, %arg3] : memref<1048576x256xf32>
      }
    }
    dealloc %0 : memref<1048576x256xf32>
    dealloc %1 : memref<1048576x256xf32>
    dealloc %2 : memref<1048576x256xf32>
    return
  }
}

This happens because the algorithm starts with the 3rd loop nest as consumer and it fuses 2nd and 1st one into it. Then, it picks 4th as consumer and tries to fuse the previously fused one into it, which it's not possible because it has multiple stores.

This PR is a stepping stone towards supporting generic multi-store producer loop nests in affine loop fusion. It extends the algorithm to support fusion of multi-store producer loop nests that:

have only one store that writes to a function-local live out, and
the remaining stores are only involved in loop nest self dependences or no dependences within the function.

It would be great to get feedback on this fix, if this approach is aligned with what you envision for the generic multi-output problem and if there are more cases that could be problematic and are currently not properly addressed in this PR.

Thanks!
Diego

This PR is a stepping stone towards supporting generic multi-store source loop nests in affine loop fusion. It extends the algorithm to support fusion of multi-store loop nests that: 1. have only one store that writes to a function-local live out, and 2. the remaining stores are involved in loop nest self dependences or no dependences within the function.

bondhugula · 2019-10-02T02:38:54Z

Hi Diego, Thanks very much! The single-store limitation has indeed been a key limitation that we may want to immediately get rid of. I'll be happy to provide feedback on this PR - @andydavis1 will have more useful feedback than I here though.

…

On 02/10/2019 05:01, Diego Caballero wrote: Hello! We've been giving affine loop fusion a try and we are pretty happy with the initial results! Great work! We are seeing quite a few loop nests being fused in our models! We noticed that currently only single-store producer loops are supported and would like to contribute a fix for that since it seems to be an important limitation. Even though our loop nests usually have a single store, this limitation is exposed when several single-store loop nests can be fused. The following example is a snippet of a model we are working on, where 4 loop nests could be fused into a single one: |func @main(%in : memref<1048576x256xf32>, %out : memref<1048576x256xf32>) { %cst = constant 0.000000e+00 : f32 %6 = alloc() : memref<1048576x256xf32> affine.for %arg7 = 0 to 1048576 { affine.for %arg8 = 0 to 256 { %13 = affine.load %in[%arg7, %arg8] : memref<1048576x256xf32> %14 = affine.load %in[%arg7, %arg8] : memref<1048576x256xf32> %15 = mulf %14, %13 : f32 affine.store %15, %6[%arg7, %arg8] : memref<1048576x256xf32> } } %7 = alloc() : memref<1048576x256xf32> affine.for %arg7 = 0 to 1048576 { affine.for %arg8 = 0 to 256 { %13 = affine.load %6[%arg7, %arg8] : memref<1048576x256xf32> %14 = cmpf "ogt", %13, %cst : f32 %15 = select %14, %13, %cst : f32 affine.store %15, %7[%arg7, %arg8] : memref<1048576x256xf32> } } %8 = alloc() : memref<1048576x256xf32> affine.for %arg7 = 0 to 1048576 { affine.for %arg8 = 0 to 256 { %13 = affine.load %7[%arg7, %arg8] : memref<1048576x256xf32> %14 = affine.load %7[%arg7, %arg8] : memref<1048576x256xf32> %15 = mulf %14, %13 : f32 affine.store %15, %8[%arg7, %arg8] : memref<1048576x256xf32> } } affine.for %arg7 = 0 to 1048576 { affine.for %arg8 = 0 to 256 { %13 = affine.load %8[%arg7, %arg8] : memref<1048576x256xf32> %14 = cmpf "ogt", %13, %cst : f32 %15 = select %14, %13, %cst : f32 affine.store %15, %out[%arg7, %arg8] : memref<1048576x256xf32> } } dealloc %6 : memref<1048576x256xf32> dealloc %7 : memref<1048576x256xf32> dealloc %8 : memref<1048576x256xf32> return } | However, affine loop fusion only fuses the first 3 loops and not the 4th one: |module { func @main(%arg0: memref<1048576x256xf32>, %arg1: memref<1048576x256xf32>) { %cst = constant 0.000000e+00 : f32 %0 = alloc() : memref<1048576x256xf32> %1 = alloc() : memref<1048576x256xf32> %2 = alloc() : memref<1048576x256xf32> affine.for %arg2 = 0 to 1048576 { affine.for %arg3 = 0 to 256 { %3 = affine.load %arg0[%arg2, %arg3] : memref<1048576x256xf32> %4 = affine.load %arg0[%arg2, %arg3] : memref<1048576x256xf32> %5 = mulf %4, %3 : f32 affine.store %5, %0[%arg2, %arg3] : memref<1048576x256xf32> %6 = affine.load %0[%arg2, %arg3] : memref<1048576x256xf32> %7 = cmpf "ogt", %6, %cst : f32 %8 = select %7, %6, %cst : f32 affine.store %8, %1[%arg2, %arg3] : memref<1048576x256xf32> %9 = affine.load %1[%arg2, %arg3] : memref<1048576x256xf32> %10 = affine.load %1[%arg2, %arg3] : memref<1048576x256xf32> %11 = mulf %10, %9 : f32 affine.store %11, %2[%arg2, %arg3] : memref<1048576x256xf32> } } affine.for %arg2 = 0 to 1048576 { affine.for %arg3 = 0 to 256 { %3 = affine.load %2[%arg2, %arg3] : memref<1048576x256xf32> %4 = cmpf "ogt", %3, %cst : f32 %5 = select %4, %3, %cst : f32 affine.store %5, %arg1[%arg2, %arg3] : memref<1048576x256xf32> } } dealloc %0 : memref<1048576x256xf32> dealloc %1 : memref<1048576x256xf32> dealloc %2 : memref<1048576x256xf32> return } } | This happens because the algorithm starts with the 3rd loop nest as consumer and it fuses 2nd and 1st one into it. Then, it picks 4th as consumer and tries to fuse the previously fused one into it, which it's not possible because it has multiple stores. This PR is a stepping stone towards supporting generic multi-store producer loop nests in affine loop fusion. It extends the algorithm to support fusion of multi-store producer loop nests that: 1. have only one store that writes to a function-local live out, and 2. the remaining stores are only involved in loop nest self dependences or no dependences within the function. It would be great to get feedback on this fix, if this approach is aligned with what you envision for the generic multi-output problem and if there are more cases that could be problematic and are currently not properly addressed in this PR. Thanks! Diego ------------------------------------------------------------------------ You can view, comment on, or merge this pull request online at: #162 Commit Summary * Add support for some multi-store cases in affine fusion File Changes * *M* lib/Transforms/LoopFusion.cpp <https://github.com/tensorflow/mlir/pull/162/files#diff-0> (64) * *M* test/Transforms/loop-fusion.mlir <https://github.com/tensorflow/mlir/pull/162/files#diff-1> (75) Patch Links: * https://github.com/tensorflow/mlir/pull/162.patch * https://github.com/tensorflow/mlir/pull/162.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#162?email_source=notifications&email_token=ABVPBEJHXHCHD2UQH336YS3QMPMT3A5CNFSM4I4P6QK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HO7U2XA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABVPBEPITRITRSTSWLSGGYTQMPMT3ANCNFSM4I4P6QKQ>.

andydavis1 · 2019-10-02T17:33:08Z

Hello Diego,

Glad you are using Affine Loop Fusion, and thanks for the feedback and PR!!! I'll take a look at the PR in a bit.
W.r.t future plans for this pass: I do have a plan for a new Affine Loop Fusion pass, which will handle the many of these corner cases in a unified way (and result in a simpler pass). I've started to build out the utilities for the new pass here: https://github.com/tensorflow/mlir/blob/master/include/mlir/Transforms/LoopFusionUtils.h
Stay tuned!

dcaballe · 2019-10-02T17:40:26Z

Great! Thank you both!
Please, keep me posted on the new fusion algorithm and let me know if I can help with reviews or something. Do you plan to send an RFC highlighting the main changes, limitations and improvements?

bondhugula

A first round of superficial comments.

bondhugula · 2019-10-02T15:14:48Z

lib/Transforms/LoopFusion.cpp

+  AffineStoreOp getUniqueStoreToLiveOut(Node *node) {
+    AffineStoreOp uniqueStore;
+    for (auto *op : node->stores) {
+      auto storeOpInst = cast<AffineStoreOp>(op);


Nit: The "Inst" suffix used to be used earlier when MLIR operations used to be called Instructions. You can just drop it now, i.e.,
storeOpInst -> storeOp

bondhugula · 2019-10-02T15:15:17Z

lib/Transforms/LoopFusion.cpp

@@ -322,6 +322,38 @@ struct MemRefDependenceGraph {
    return false;
  }

+  // Returns the unique AffineStoreOp in `node` that meets all the following:
+  //   *) store is the only one that writes to a function-local live out memref,
+  //   *) store is not the source of a `node` self-dependence.


a self dependence on node ?

bondhugula · 2019-10-02T15:17:49Z

lib/Transforms/LoopFusion.cpp

+  //   *) store is the only one that writes to a function-local live out memref,
+  //   *) store is not the source of a `node` self-dependence.
+  // Otherwise, returns a null AffineStoreOp.
+  AffineStoreOp getUniqueStoreToLiveOut(Node *node) {


This should be a local / static function.

It uses class member outEdges. I could make it a static non-member function but I think it makes sense as it is. It's similar to other member functions, such as writesToLiveInOrEscapingMemrefs.

Sorry, had missed it; fine as is.

bondhugula · 2019-10-02T15:22:09Z

lib/Transforms/LoopFusion.cpp

+          // live-out.
+          // TODO(andydavis) Support more generic multi-output src loop nests
+          // fusion.
+          auto srcStoreOpInst = mdg->getUniqueStoreToLiveOut(srcNode);


srcStoreOpInst -> srcStoreOp

bondhugula · 2019-10-02T15:23:08Z

test/Transforms/loop-fusion.mlir

@@ -1259,6 +1260,7 @@ func @should_not_fuse_multi_output_producer() {
  // CHECK-NEXT:  }
  // CHECK-NEXT:  affine.for %{{.*}} = 0 to 10 {
  // CHECK-NEXT:    %{{.*}} = affine.load %{{.*}}[%{{.*}}] : memref<10xf32>
+  // CHECK-NEXT:    %{{.*}} = affine.load %{{.*}}[%{{.*}}] : memref<10xf32>


No need to match the result here and in the one above.

bondhugula · 2019-10-02T15:28:51Z

lib/Transforms/LoopFusion.cpp

@@ -972,25 +1004,17 @@ static Value *createPrivateMemRef(AffineForOp forOp, Operation *srcStoreOpInst,
 // TODO(andydavis) Generalize this to handle more live in/out cases.
 static bool canFuseSrcWhichWritesToLiveOut(unsigned srcId, unsigned dstId,
                                           Value *memref,
+                                           AffineStoreOp srcStoreOpInst,


srcStoreOpInst -> srcStoreOp

bondhugula · 2019-10-02T15:30:12Z

test/Transforms/loop-fusion.mlir

+    %v1 = affine.load %m[%i1] : memref<10xf32>
+  }
+  // CHECK:      affine.for [[i0:%.*]] = 0 to 10 {
+  // CHECK-NEXT:   affine.store {{%.*}}, [[LOCAL_M:%.*]]{{\[}}[[i0]]{{\]}} : memref<10xf32>


%{{.*}} to be consistent.

bondhugula · 2019-10-02T15:31:12Z

test/Transforms/loop-fusion.mlir

+  affine.for %i1 = 0 to 10 {
+    %v1 = affine.load %m[%i1] : memref<10xf32>
+  }
+  // CHECK:      affine.for [[i0:%.*]] = 0 to 10 {


You can capture this as:
%[[i0:.*]] and then use %[[i0]]. This way you won't need the escapes below.

Nice! Thanks! I forgot to ask about alternatives for this since I found it a bit uncomfortable.

bondhugula · 2019-10-02T15:31:46Z

test/Transforms/loop-fusion.mlir

+  }
+  // CHECK:      affine.for [[i0:%.*]] = 0 to 10 {
+  // CHECK-NEXT:   affine.store {{%.*}}, [[LOCAL_M:%.*]]{{\[}}[[i0]]{{\]}} : memref<10xf32>
+  // CHECK-NEXT:   [[v0:%.*]] = affine.load [[LOCAL_M]]{{\[}}[[i0]]{{\]}} : memref<10xf32>


This could be rewritten as:
[%[[i0]]]

bondhugula · 2019-10-02T15:33:59Z

test/Transforms/loop-fusion.mlir

+    %v0 = affine.load %m[%i1] : memref<10xf32>
+  }
+  // CHECK:      affine.for [[i0:%.*]] = 0 to 10 {
+  // CHECK-NEXT:   affine.store {{%.*}}, {{%.*\[}}[[i0]]{{\]}} : memref<10xf32>


Likewise. Capture %[[i0:.*]] and use [%[[i0]]] -- more readable this way.

dcaballe

Thanks, Uday! Addressed.

dcaballe · 2019-10-02T22:30:05Z

lib/Transforms/LoopFusion.cpp

+  //   *) store is the only one that writes to a function-local live out memref,
+  //   *) store is not the source of a `node` self-dependence.
+  // Otherwise, returns a null AffineStoreOp.
+  AffineStoreOp getUniqueStoreToLiveOut(Node *node) {


It uses class member outEdges. I could make it a static non-member function but I think it makes sense as it is. It's similar to other member functions, such as writesToLiveInOrEscapingMemrefs.

dcaballe · 2019-10-02T22:46:13Z

test/Transforms/loop-fusion.mlir

+  affine.for %i1 = 0 to 10 {
+    %v1 = affine.load %m[%i1] : memref<10xf32>
+  }
+  // CHECK:      affine.for [[i0:%.*]] = 0 to 10 {


Nice! Thanks! I forgot to ask about alternatives for this since I found it a bit uncomfortable.

bondhugula

This PR looks great to me! Just some efficiency and documentation related suggestions.

bondhugula · 2019-10-03T09:24:39Z

lib/Transforms/LoopFusion.cpp

@@ -322,6 +322,38 @@ struct MemRefDependenceGraph {
    return false;
  }

+  // Returns the unique AffineStoreOp in `node` that meets all the following:
+  //   *) store is the only one that writes to a function-local live out memref,


I sort of prefer:
"function-local memref live out of node"
The comments inside should be fine.

bondhugula · 2019-10-03T09:29:17Z

lib/Transforms/LoopFusion.cpp

+  //   *) store is the only one that writes to a function-local live out memref,
+  //   *) store is not the source of a `node` self-dependence.
+  // Otherwise, returns a null AffineStoreOp.
+  AffineStoreOp getUniqueStoreToLiveOut(Node *node) {


Sorry, had missed it; fine as is.

bondhugula · 2019-10-03T09:36:07Z

lib/Transforms/LoopFusion.cpp

@@ -972,33 +1004,25 @@ static Value *createPrivateMemRef(AffineForOp forOp, Operation *srcStoreOpInst,
 // TODO(andydavis) Generalize this to handle more live in/out cases.
 static bool canFuseSrcWhichWritesToLiveOut(unsigned srcId, unsigned dstId,
                                           Value *memref,
+                                           AffineStoreOp srcStoreOp,


The comment for this function needs an update 'srcNode' could write to multiple memrefs, but only one of them may have an outgoing edge.

Function comment doesn't document 'srcStoreOp'.

I revisited this a bit since memref is no longer needed and some of the checks should be actually asserts after getUniqueStoreToLiveOut. Please, let me know if the documentation is not clear. We can iterate on it.

bondhugula · 2019-10-03T09:36:44Z

lib/Transforms/LoopFusion.cpp

-  // Check that all stores are to the same memref.
-  if (storeMemrefs.size() != 1 ||
-      mdg->getOutEdgeCount(srcNode->id, memref) != 1)
+  // Return false if 'srcNode' has more than one output edge on 'memref'.


unless 'srcNode' has exactly one outgoing edge on 'memref'.

bondhugula · 2019-10-03T09:42:59Z

lib/Transforms/LoopFusion.cpp

+  //   *) store is the only one that writes to a function-local live out memref,
+  //   *) store is not the source of a self-dependence on `node`.
+  // Otherwise, returns a null AffineStoreOp.
+  AffineStoreOp getUniqueStoreToLiveOut(Node *node) {


getUniqueOutgoingStore or getUniqueLiveOutStore sounds better to me.

bondhugula · 2019-10-03T09:50:54Z

test/Transforms/loop-fusion.mlir

+// -----
+
+// CHECK-LABEL: func @should_fuse_function_live_out_multi_store_producer
+func @should_fuse_function_live_out_multi_store_producer(%live_out_m : memref<10xf32>) {


Nit: This is really %arg_m or %live_in_out_m.

bondhugula · 2019-10-03T10:03:58Z

lib/Transforms/LoopFusion.cpp

+          // live-out.
+          // TODO(andydavis) Support more generic multi-output src loop nests
+          // fusion.
+          auto srcStoreOp = mdg->getUniqueStoreToLiveOut(srcNode);


Shouldn't this store be on 'memref'? Since you are actually checking getOutEdgeCount != 1, you've made sure to eliminate the "no store case" on memref. So, if this finds a store, it will have to be on 'memref'.

Yeah, it looks like it. I had some problems with this assumption in an early implementation but probably something was wrong. I think it's better to keep getUniqueStoreToLiveOut a bit more generic so I'll add an assert for that constraint here. Thanks!

bondhugula · 2019-10-03T10:08:09Z

lib/Transforms/LoopFusion.cpp

+    for (auto *op : node->stores) {
+      auto storeOp = cast<AffineStoreOp>(op);
+      auto *memref = storeOp.getMemRef();
+      auto outEdgeIt = outEdges.find(node->id);


This is invariant and can be hoisted out.

bondhugula · 2019-10-03T10:09:25Z

lib/Transforms/LoopFusion.cpp

+      //    (self-dependence edges are not represented in graph at the moment),
+      // *) writes to a function live out memref (function parameter), or
+      // *) is dead.
+      if (outEdgeIt == outEdges.end() ||


This is invariant and can be hoisted out and you can exit early. If the node doesn't have outgoing edges, just return nullptr?!

Yeah, good catch!

bondhugula · 2019-10-03T10:11:41Z

lib/Transforms/LoopFusion.cpp

+      // *) writes to a function live out memref (function parameter), or
+      // *) is dead.
+      if (outEdgeIt == outEdges.end() ||
+          llvm::all_of(outEdgeIt->second, [=](const Edge &edge) {


Hoist out outEdgeIt->second access as well.

const auto &nodeOutEdges = outEdgeIt->second;

The only thing needed inside is:

if (llvm::all_of(outEdges, [=](const Edge &edge) { ... ) continue;

dcaballe · 2019-10-03T22:27:58Z

Thanks, Uday!

andydavis1 · 2019-10-03T22:35:05Z

test/Transforms/loop-fusion.mlir

+// -----
+
+// CHECK-LABEL: func @should_fuse_self_dependence_multi_store_producer() {
+func @should_fuse_self_dependence_multi_store_producer() {


Do you have any multi-store fusion unit tests where one of the fused loops stores to a live out memref?
func @test(%live_out : : memref<64x9xi32>) {
// Add loop nest here which stores to %live_out
// Add other loop nests to fuse with above loop nest.
return %live_out : memref<64x9xi32>
}

Is the one in 2325 (should_fuse_function_live_out_multi_store_producer) what you are looking for?

Yes. Thanks.

dcaballe · 2019-10-07T22:43:48Z

Anything else is needed?

bondhugula · 2019-10-08T02:41:32Z

This looks great to me.

andydavis1 · 2019-10-08T16:46:08Z

Yes. Looks good. Thanks Diego!

This PR is a stepping stone towards supporting generic multi-store source loop nests in affine loop fusion. It extends the algorithm to support fusion of multi-store loop nests that: 1. have only one store that writes to a function-local live out, and 2. the remaining stores are involved in loop nest self dependences or no dependences within the function. Closes #162 COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#162 from dcaballe:dcaballe/multi-output-fusion 7fb7dec6fe8b45f5ce176f018bfe37b256420c45 PiperOrigin-RevId: 273773907

tensorflow#162 introduced a bug that incorrectly allowed fusion of producer loops with multiple outgoing edges. This commit fixes that problem. It also introduces a new flag to disable sibling loop fusion so that we can test producer-consumer fusion in isolation.

#162 introduced a bug that incorrectly allowed fusion of producer loops with multiple outgoing edges. This commit fixes that problem. It also introduces a new flag to disable sibling loop fusion so that we can test producer-consumer fusion in isolation. Closes #259 COPYBARA_INTEGRATE_REVIEW=#259 from dcaballe:dcaballe/fix_multi_out_edge_producer_fusion 578d566 PiperOrigin-RevId: 283531105

tensorflow/mlir#162 introduced a bug that incorrectly allowed fusion of producer loops with multiple outgoing edges. This commit fixes that problem. It also introduces a new flag to disable sibling loop fusion so that we can test producer-consumer fusion in isolation. Closes #259 COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#259 from dcaballe:dcaballe/fix_multi_out_edge_producer_fusion 578d5661705fd5c56c555832d5e0528df88c5282 PiperOrigin-RevId: 283531105 Change-Id: I3a6173463ea20bd35555c24fa451bfbf2dfac098

This PR is a stepping stone towards supporting generic multi-store source loop nests in affine loop fusion. It extends the algorithm to support fusion of multi-store loop nests that: 1. have only one store that writes to a function-local live out, and 2. the remaining stores are involved in loop nest self dependences or no dependences within the function. Closes tensorflow/mlir#162 COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#162 from dcaballe:dcaballe/multi-output-fusion 7fb7dec6fe8b45f5ce176f018bfe37b256420c45 PiperOrigin-RevId: 273773907

tensorflow/mlir#162 introduced a bug that incorrectly allowed fusion of producer loops with multiple outgoing edges. This commit fixes that problem. It also introduces a new flag to disable sibling loop fusion so that we can test producer-consumer fusion in isolation. Closes tensorflow/mlir#259 COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#259 from dcaballe:dcaballe/fix_multi_out_edge_producer_fusion 578d5661705fd5c56c555832d5e0528df88c5282 PiperOrigin-RevId: 283531105

googlebot added the cla: yes label Oct 1, 2019

bondhugula reviewed Oct 2, 2019

View reviewed changes

Address Uday's feedback.

dd7684b

dcaballe commented Oct 2, 2019

View reviewed changes

bondhugula approved these changes Oct 3, 2019

View reviewed changes

joker-eph assigned andydavis1 Oct 3, 2019

Address feedback

7fb7dec

andydavis1 reviewed Oct 3, 2019

View reviewed changes

andydavis1 approved these changes Oct 8, 2019

View reviewed changes

joker-eph added kokoro:run ready to pull labels Oct 9, 2019

kokoro-team removed the kokoro:run label Oct 9, 2019

mlir-copybara-bot closed this in bff10dc Oct 9, 2019

dcaballe mentioned this pull request Nov 22, 2019

AffineLoopFusion: Prevent fusion of multi-out-edge producer loops #259

Closed

Add support for some multi-store cases in affine loop fusion #162

Add support for some multi-store cases in affine loop fusion #162

Conversation

dcaballe commented Oct 1, 2019

bondhugula commented Oct 2, 2019 via email • edited Loading

andydavis1 commented Oct 2, 2019 • edited Loading

dcaballe commented Oct 2, 2019

bondhugula left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcaballe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bondhugula left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcaballe commented Oct 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcaballe commented Oct 7, 2019

bondhugula commented Oct 8, 2019

andydavis1 commented Oct 8, 2019

bondhugula commented Oct 2, 2019 via email •

edited

Loading

andydavis1 commented Oct 2, 2019 •

edited

Loading