AffineLoopFusion: Prevent fusion of multi-out-edge producer loops #259
AffineLoopFusion: Prevent fusion of multi-out-edge producer loops #259
Conversation
tensorflow#162 introduced a bug that incorrectly allowed fusion of producer loops with multiple outgoing edges. This commit fixes that problem. It also introduces a new flag to disable sibling loop fusion so that we can test producer-consumer fusion in isolation.
lib/Transforms/LoopFusion.cpp
Outdated
"Expected only outgoing edges from memref in srcId"); | ||
|
||
// Return false if 'srcNode' has more than one output edge on 'memref'. | ||
if (memrefNumOutEdges != 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> 1
?
lib/Transforms/LoopFusion.cpp
Outdated
fuseProducerConsumerNodes(/*maxSrcUserCount=*/1); | ||
fuseSiblingNodes(); | ||
if (clSiblingFusion) { | ||
fuseProducerConsumerNodes(/*maxSrcUserCount=*/1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the call to fuseProducerConsumerNodes here under clSiblingFusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed that if sibling fusion was disabled, only one fuseProducerConsumerNodes
pass was needed and I left the one with the highest maxSrcUserCount
. I now see comment in line 1472 and I'm not sure if a single pass will address all the producer-consumer cases. Could a first pass expose new producer-consumer cases that are not handled by the own pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please put call the "fuseSiblingNodes()" under the flag "clSiblingFusion", otherwise I think users will get confused. If you need to disable producer-consumer fusion, then it should be under a separate flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we may not need the flag at all, if you can construct a test case which is illegal for sibling fusion.
// producer loop
for ...
store %v, %a[%i]
// Consumer loop 1
for ...
load %a[%i]
store %v, %b[%i]
// Consumer loop 2
for ...
load %a[%i]
load %b[%i]
The dependence on '%b' between the two consumer nodes should prevent sibling fusion
test/Transforms/loop-fusion.mlir
Outdated
// multiple outgoing edges. Sibling loop fusion is disabled to properly test | ||
// producer-consumer fusion in isolation. | ||
// CHECK-LABEL: func @should_not_fuse_multi_outgoing_edge_store_producer | ||
func @should_not_fuse_multi_outgoing_edge_store_producer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Producer consumer fusion by itself is expected to fuse everything here. You shouldn't need sibling fusion to be turned on for fusion here. As such, the desired output even with -sibling-fusion=false would be the fused nest. While the fused nest could be seen as a sibling fusion followed by producer consumer fusion (which is probably what you had in mind?), it can also be viewed just as a sequence of two producer consumer fusions.
Disabling sibling fusion would be expected to stop fusion in other cases like these.
for %i = ...
%v = affine.load %A[%i]
affine.store %v, %B[%i]
for %j = ...
%v = affine.load %A[%i]
affine.store %v, %C[%i]
There is no producer consumer relationship or reuse, but "sibling fusion" (being for input/RAR reuse) will fuse these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense to me. A mature producer-consumer fusion algorithm should fuse loops below. However, what I'm trying to test here is that the current producer-consumer algorithm is not fusing a producer with more than one outgoing edge because that is not supported currently (note the three NOSIBLING
checks). If I didn't disable sibling fusion, producer-consumer algorithm wouldn't fuse the loops but sibling fusion would and I couldn't be sure who did it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try a unit test as follows. This test would not allow sibling fusion of the two consumer loop nests because of the dependence on '%b'.
// producer loop
for ...
store %v, %a[%i]
// Consumer loop 1
for ...
load %a[%i]
store %v, %b[%i]
// Consumer loop 2
for ...
load %a[%i]
load %b[%i]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I’ve been playing with the suggested test case:
func @should_not_fuse_multi_outgoing_edge_store_producer2(%a : memref<1xf32>, %b : memref<1xf32>) {
%cst = constant 0.000000e+00 : f32
affine.for %arg0 = 0 to 1 {
affine.store %cst, %a[%arg0] : memref<1xf32>
}
affine.for %arg0 = 0 to 1 {
%0 = affine.load %a[%arg0] : memref<1xf32>
affine.store %cst, %b[%arg0] : memref<1xf32>
}
affine.for %arg0 = 0 to 1 {
%0 = affine.load %a[%arg0] : memref<1xf32>
%1 = affine.load %b[%arg0] : memref<1xf32>
}
return
}
Producer-consumer fusion fuses the three loops. It fuses 2nd and 3rd first, then 1st becomes a producer with a single outgoing edge and it’s fused later so we cannot use this test.
However, I noticed that if I make ‘%a’ and external memref in my original test case, I can test exactly what I want and sibling fusion doesn’t kick in. I think we can proceed this way and remove clSiblingFusion
for now, although I kind of like that flag. It's useful for debugging/independent testing.
Thanks for the comments, @bondhugula! |
Hi Diego, thanks for contributing! I can take a look at this... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified original test and removed clSiblingFusion
.
I also added a new test (and fixes related to it) that it asserted now. It’s a producer with multiple outgoing edges where there is a single store with a single outgoing edge.
Please, let me know if you have any other comments.
Thanks!
test/Transforms/loop-fusion.mlir
Outdated
// multiple outgoing edges. Sibling loop fusion is disabled to properly test | ||
// producer-consumer fusion in isolation. | ||
// CHECK-LABEL: func @should_not_fuse_multi_outgoing_edge_store_producer | ||
func @should_not_fuse_multi_outgoing_edge_store_producer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I’ve been playing with the suggested test case:
func @should_not_fuse_multi_outgoing_edge_store_producer2(%a : memref<1xf32>, %b : memref<1xf32>) {
%cst = constant 0.000000e+00 : f32
affine.for %arg0 = 0 to 1 {
affine.store %cst, %a[%arg0] : memref<1xf32>
}
affine.for %arg0 = 0 to 1 {
%0 = affine.load %a[%arg0] : memref<1xf32>
affine.store %cst, %b[%arg0] : memref<1xf32>
}
affine.for %arg0 = 0 to 1 {
%0 = affine.load %a[%arg0] : memref<1xf32>
%1 = affine.load %b[%arg0] : memref<1xf32>
}
return
}
Producer-consumer fusion fuses the three loops. It fuses 2nd and 3rd first, then 1st becomes a producer with a single outgoing edge and it’s fused later so we cannot use this test.
However, I noticed that if I make ‘%a’ and external memref in my original test case, I can test exactly what I want and sibling fusion doesn’t kick in. I think we can proceed this way and remove clSiblingFusion
for now, although I kind of like that flag. It's useful for debugging/independent testing.
I'm a bit confused here. Why should the fusion be prevented here in cases like these where it's valid (referring to the test case you added) - where you have a single producer and multiple consumers? I recall this was what you had addressed in your earlier commit. Are you introducing the limitation back? :-) |
My previous commit introduced support for multi-store producers with a single outgoing edge within the function. That is still working fine. The test I added is intact: https://github.com/tensorflow/mlir/pull/259/files/b8e7f532a846e8a4f76b04473db21e9ffb7d15f7#diff-ea810dda41c98feaab3d122f33648fb5R2327. However, as part of that change, I asserted on that producers reaching Does it make sense now? :) |
I see - sounds good then. |
tensorflow/mlir#162 introduced a bug that incorrectly allowed fusion of producer loops with multiple outgoing edges. This commit fixes that problem. It also introduces a new flag to disable sibling loop fusion so that we can test producer-consumer fusion in isolation. Closes #259 COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#259 from dcaballe:dcaballe/fix_multi_out_edge_producer_fusion 578d5661705fd5c56c555832d5e0528df88c5282 PiperOrigin-RevId: 283531105 Change-Id: I3a6173463ea20bd35555c24fa451bfbf2dfac098
tensorflow/mlir#162 introduced a bug that incorrectly allowed fusion of producer loops with multiple outgoing edges. This commit fixes that problem. It also introduces a new flag to disable sibling loop fusion so that we can test producer-consumer fusion in isolation. Closes tensorflow/mlir#259 COPYBARA_INTEGRATE_REVIEW=tensorflow/mlir#259 from dcaballe:dcaballe/fix_multi_out_edge_producer_fusion 578d5661705fd5c56c555832d5e0528df88c5282 PiperOrigin-RevId: 283531105
In #162 I introduced a bug that
incorrectly allowed fusion of producer loops with multiple outgoing
edges. This commit fixes that problem. It also introduces a new flag to
disable sibling loop fusion so that we can test producer-consumer fusion
in isolation.