Skip to content
This repository has been archived by the owner on Apr 23, 2021. It is now read-only.

AffineLoopFusion: Prevent fusion of multi-out-edge producer loops #259

Conversation

dcaballe
Copy link
Contributor

In #162 I introduced a bug that
incorrectly allowed fusion of producer loops with multiple outgoing
edges. This commit fixes that problem. It also introduces a new flag to
disable sibling loop fusion so that we can test producer-consumer fusion
in isolation.

tensorflow#162 introduced a bug that
incorrectly allowed fusion of producer loops with multiple outgoing
edges. This commit fixes that problem. It also introduces a new flag to
disable sibling loop fusion so that we can test producer-consumer fusion
in isolation.
"Expected only outgoing edges from memref in srcId");

// Return false if 'srcNode' has more than one output edge on 'memref'.
if (memrefNumOutEdges != 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 1 ?

fuseProducerConsumerNodes(/*maxSrcUserCount=*/1);
fuseSiblingNodes();
if (clSiblingFusion) {
fuseProducerConsumerNodes(/*maxSrcUserCount=*/1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the call to fuseProducerConsumerNodes here under clSiblingFusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that if sibling fusion was disabled, only one fuseProducerConsumerNodes pass was needed and I left the one with the highest maxSrcUserCount. I now see comment in line 1472 and I'm not sure if a single pass will address all the producer-consumer cases. Could a first pass expose new producer-consumer cases that are not handled by the own pass?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please put call the "fuseSiblingNodes()" under the flag "clSiblingFusion", otherwise I think users will get confused. If you need to disable producer-consumer fusion, then it should be under a separate flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we may not need the flag at all, if you can construct a test case which is illegal for sibling fusion.

// producer loop
for ...
store %v, %a[%i]

// Consumer loop 1
for ...
load %a[%i]
store %v, %b[%i]

// Consumer loop 2
for ...
load %a[%i]
load %b[%i]

The dependence on '%b' between the two consumer nodes should prevent sibling fusion

// multiple outgoing edges. Sibling loop fusion is disabled to properly test
// producer-consumer fusion in isolation.
// CHECK-LABEL: func @should_not_fuse_multi_outgoing_edge_store_producer
func @should_not_fuse_multi_outgoing_edge_store_producer() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Producer consumer fusion by itself is expected to fuse everything here. You shouldn't need sibling fusion to be turned on for fusion here. As such, the desired output even with -sibling-fusion=false would be the fused nest. While the fused nest could be seen as a sibling fusion followed by producer consumer fusion (which is probably what you had in mind?), it can also be viewed just as a sequence of two producer consumer fusions.

Disabling sibling fusion would be expected to stop fusion in other cases like these.

for %i =  ...
   %v = affine.load %A[%i]
   affine.store %v, %B[%i]

for %j = ...
   %v = affine.load %A[%i]
   affine.store %v, %C[%i]

There is no producer consumer relationship or reuse, but "sibling fusion" (being for input/RAR reuse) will fuse these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense to me. A mature producer-consumer fusion algorithm should fuse loops below. However, what I'm trying to test here is that the current producer-consumer algorithm is not fusing a producer with more than one outgoing edge because that is not supported currently (note the three NOSIBLING checks). If I didn't disable sibling fusion, producer-consumer algorithm wouldn't fuse the loops but sibling fusion would and I couldn't be sure who did it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try a unit test as follows. This test would not allow sibling fusion of the two consumer loop nests because of the dependence on '%b'.

// producer loop
for ...
store %v, %a[%i]

// Consumer loop 1
for ...
load %a[%i]
store %v, %b[%i]

// Consumer loop 2
for ...
load %a[%i]
load %b[%i]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I’ve been playing with the suggested test case:

func @should_not_fuse_multi_outgoing_edge_store_producer2(%a : memref<1xf32>, %b : memref<1xf32>) {
    %cst = constant 0.000000e+00 : f32
    affine.for %arg0 = 0 to 1 {
      affine.store %cst, %a[%arg0] : memref<1xf32>
    }
 
    affine.for %arg0 = 0 to 1 {
      %0 = affine.load %a[%arg0] : memref<1xf32>
      affine.store %cst, %b[%arg0] : memref<1xf32>
    }

    affine.for %arg0 = 0 to 1 {
      %0 = affine.load %a[%arg0] : memref<1xf32>
      %1 = affine.load %b[%arg0] : memref<1xf32>
    }
    return
  } 

Producer-consumer fusion fuses the three loops. It fuses 2nd and 3rd first, then 1st becomes a producer with a single outgoing edge and it’s fused later so we cannot use this test.
However, I noticed that if I make ‘%a’ and external memref in my original test case, I can test exactly what I want and sibling fusion doesn’t kick in. I think we can proceed this way and remove clSiblingFusion for now, although I kind of like that flag. It's useful for debugging/independent testing.

@dcaballe
Copy link
Contributor Author

Thanks for the comments, @bondhugula!
I can't add reviewers to MLIR PRs probably because I don't have permission. Maybe @andydavis1 is also interested?

@andydavis1
Copy link
Contributor

Hi Diego, thanks for contributing! I can take a look at this...

Copy link
Contributor Author

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified original test and removed clSiblingFusion.
I also added a new test (and fixes related to it) that it asserted now. It’s a producer with multiple outgoing edges where there is a single store with a single outgoing edge.

Please, let me know if you have any other comments.
Thanks!

// multiple outgoing edges. Sibling loop fusion is disabled to properly test
// producer-consumer fusion in isolation.
// CHECK-LABEL: func @should_not_fuse_multi_outgoing_edge_store_producer
func @should_not_fuse_multi_outgoing_edge_store_producer() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I’ve been playing with the suggested test case:

func @should_not_fuse_multi_outgoing_edge_store_producer2(%a : memref<1xf32>, %b : memref<1xf32>) {
    %cst = constant 0.000000e+00 : f32
    affine.for %arg0 = 0 to 1 {
      affine.store %cst, %a[%arg0] : memref<1xf32>
    }
 
    affine.for %arg0 = 0 to 1 {
      %0 = affine.load %a[%arg0] : memref<1xf32>
      affine.store %cst, %b[%arg0] : memref<1xf32>
    }

    affine.for %arg0 = 0 to 1 {
      %0 = affine.load %a[%arg0] : memref<1xf32>
      %1 = affine.load %b[%arg0] : memref<1xf32>
    }
    return
  } 

Producer-consumer fusion fuses the three loops. It fuses 2nd and 3rd first, then 1st becomes a producer with a single outgoing edge and it’s fused later so we cannot use this test.
However, I noticed that if I make ‘%a’ and external memref in my original test case, I can test exactly what I want and sibling fusion doesn’t kick in. I think we can proceed this way and remove clSiblingFusion for now, although I kind of like that flag. It's useful for debugging/independent testing.

@bondhugula
Copy link
Contributor

Modified original test and removed clSiblingFusion.
I also added a new test (and fixes related to it) that it asserted now. It’s a producer with multiple outgoing edges where there is a single store with a single outgoing edge.

Please, let me know if you have any other comments.
Thanks!

I'm a bit confused here. Why should the fusion be prevented here in cases like these where it's valid (referring to the test case you added) - where you have a single producer and multiple consumers? I recall this was what you had addressed in your earlier commit. Are you introducing the limitation back? :-)

@dcaballe
Copy link
Contributor Author

I'm a bit confused here. Why should the fusion be prevented here in cases like these where it's valid (referring to the test case you added) - where you have a single producer and multiple consumers? I recall this was what you had addressed in your earlier commit. Are you introducing the limitation back? :-)

My previous commit introduced support for multi-store producers with a single outgoing edge within the function. That is still working fine. The test I added is intact: https://github.com/tensorflow/mlir/pull/259/files/b8e7f532a846e8a4f76b04473db21e9ffb7d15f7#diff-ea810dda41c98feaab3d122f33648fb5R2327. However, as part of that change, I asserted on that producers reaching canFuseSrcWhichWritesToLiveOut should have only one outgoing edge (https://github.com/tensorflow/mlir/pull/162/files#diff-5b47f3417e354b0773b092edc5d8740cR1014). That is not true because a single store may have multiple outgoing edges and loads can also have outgoing edges. That is the piece I’m reverting and adding test for it, which is a bit complicated due to the producer-consumer fusion and sibling fusion interaction.

Does it make sense now? :)

@bondhugula
Copy link
Contributor

I'm a bit confused here. Why should the fusion be prevented here in cases like these where it's valid (referring to the test case you added) - where you have a single producer and multiple consumers? I recall this was what you had addressed in your earlier commit. Are you introducing the limitation back? :-)

My previous commit introduced support for multi-store producers with a single outgoing edge within the function. That is still working fine. The test I added is intact: https://github.com/tensorflow/mlir/pull/259/files/b8e7f532a846e8a4f76b04473db21e9ffb7d15f7#diff-ea810dda41c98feaab3d122f33648fb5R2327. However, as part of that change, I asserted on that producers reaching canFuseSrcWhichWritesToLiveOut should have only one outgoing edge (https://github.com/tensorflow/mlir/pull/162/files#diff-5b47f3417e354b0773b092edc5d8740cR1014). That is not true because a single store may have multiple outgoing edges and loads can also have outgoing edges. That is the piece I’m reverting and adding test for it, which is a bit complicated due to the producer-consumer fusion and sibling fusion interaction.

Does it make sense now? :)

I see - sounds good then.

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Dec 3, 2019
tensorflow/mlir#162 introduced a bug that
incorrectly allowed fusion of producer loops with multiple outgoing
edges. This commit fixes that problem. It also introduces a new flag to
disable sibling loop fusion so that we can test producer-consumer fusion
in isolation.

Closes #259

COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#259 from dcaballe:dcaballe/fix_multi_out_edge_producer_fusion 578d5661705fd5c56c555832d5e0528df88c5282
PiperOrigin-RevId: 283531105
Change-Id: I3a6173463ea20bd35555c24fa451bfbf2dfac098
swift-ci pushed a commit to apple/llvm-project that referenced this pull request Dec 24, 2019
tensorflow/mlir#162 introduced a bug that
incorrectly allowed fusion of producer loops with multiple outgoing
edges. This commit fixes that problem. It also introduces a new flag to
disable sibling loop fusion so that we can test producer-consumer fusion
in isolation.

Closes tensorflow/mlir#259

COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#259 from dcaballe:dcaballe/fix_multi_out_edge_producer_fusion 578d5661705fd5c56c555832d5e0528df88c5282
PiperOrigin-RevId: 283531105
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants