[MLIR][XLA] Fix ops erase bug in DeadTempBufferRemoval #37687

xinan-jiang · 2020-03-18T10:53:49Z

This is a PR from JIZHI, the AI platform in Tencent.
@sherhut @River707

I think it will break the DAG struct that recursiveErase called directly when function.walk
The operation which needs erasing should be put in a vector when function.walk firstly and erased after function.walk

sherhut

Probably safer to do it this way, thanks. Have you seen this fail for some real example? Just curious.

The reason this works currently is that we only remove the current operation and operations that are later, i.e., not the next operation. So we never delete the operation the walk is pointing to (which is the next one).

Still worth doing. Could you address the one comment and re-upload?

sherhut · 2020-03-18T12:40:06Z

tensorflow/compiler/xla/service/mlir_gpu/kernel_lowering.cc

+      // TODO(herhut): There should be a generic helper for this.
+      recursiveErase(op);
+    }
+    opsToErase.clear();


This is not really needed, as opsToErase is going out of scope here.

xinan-jiang · 2020-03-19T07:47:57Z

below is a real example

*** IR Dump Before xla::mlir_gpu::{anonymous}::DeadTempBufferRemoval ***
func @fusion.96(%arg0: memref<33708x1024xf32>, %arg1: memref<33708x1024xf32>) {
%c0 = constant 0 : index
%c1 = constant 1 : index
%cst = constant 3.125000e-02 : f32
%c33708 = constant 33708 : index
%c1024 = constant 1024 : index
%0 = alloc() {temp = true} : memref<33708x1024xf32>
%1 = alloc() {temp = true} : memref
store %cst, %1[] : memref
loop.for %arg2 = %c0 to %c33708 step %c1 {
loop.for %arg3 = %c0 to %c1024 step %c1 {
%2 = subview %arg0[%arg2, %arg3] [] [] : memref<33708x1024xf32> to memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%3 = subview %0[%arg2, %arg3] [] [] : memref<33708x1024xf32> to memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%4 = subview %arg1[%arg2, %arg3] [] [] : memref<33708x1024xf32> to memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%5 = load %3[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%cst_0 = constant 3.125000e-02 : f32
%6 = load %1[] : memref
store %cst_0, %3[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%7 = load %2[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%8 = load %4[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%9 = mulf %7, %cst_0 : f32
store %9, %4[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
}
}
dealloc %1 : memref
dealloc %0 : memref<33708x1024xf32>
return
}

Fatal Python error: Segmentation fault

xinan-jiang · 2020-03-19T08:17:33Z

Another question, how to add UT for the passes?

sherhut · 2020-03-19T12:06:40Z

Thanks for the example. It has a store directly after the alloc, which makes this fail. Thanks for catching this.

Another question, how to add UT for the passes?

Currently these cannot be tested, as they are not visible at all. They all have TODOs to turn them into real implementations as opposed to the prototype state they are currently in. A first step would be to move them into a separate file, adding a PassRegistration so that they become visible and then link them into a mlir-opt like tool. Unfortunately, none of this currently exists. We do not even have a directory structure for it. Let me see whether we can quickly add this.

In the meantime, this should still land.

tensorflow-bot bot added the size:S CL Change Size: Small label Mar 18, 2020

googlebot added the cla: yes label Mar 18, 2020

gbaned self-assigned this Mar 18, 2020

gbaned added the comp:xla XLA label Mar 18, 2020

gbaned added this to Assigned Reviewer in PR Queue via automation Mar 18, 2020

gbaned requested a review from sherhut March 18, 2020 12:12

sherhut suggested changes Mar 18, 2020

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Mar 18, 2020

[MLIR][XLA] Fix ops erase bug in DeadTempBufferRemoval

9a7ad6f

xinan-jiang force-pushed the pr/mlir/fix-dead-temp-buffer-removal branch from 882c731 to 9a7ad6f Compare March 19, 2020 07:47

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Mar 19, 2020

sherhut approved these changes Mar 19, 2020

View reviewed changes

tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 19, 2020

rthadur removed kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 23, 2020

rthadur approved these changes Mar 23, 2020

View reviewed changes

tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 23, 2020

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 23, 2020

tensorflow-copybara merged commit 4286b65 into tensorflow:master Mar 23, 2020

PR Queue automation moved this from Approved by Reviewer to Merged Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR][XLA] Fix ops erase bug in DeadTempBufferRemoval #37687

[MLIR][XLA] Fix ops erase bug in DeadTempBufferRemoval #37687

xinan-jiang commented Mar 18, 2020

sherhut left a comment

sherhut Mar 18, 2020

xinan-jiang Mar 19, 2020

xinan-jiang commented Mar 19, 2020

xinan-jiang commented Mar 19, 2020

sherhut commented Mar 19, 2020

[MLIR][XLA] Fix ops erase bug in DeadTempBufferRemoval #37687

[MLIR][XLA] Fix ops erase bug in DeadTempBufferRemoval #37687

Conversation

xinan-jiang commented Mar 18, 2020

sherhut left a comment

Choose a reason for hiding this comment

sherhut Mar 18, 2020

Choose a reason for hiding this comment

xinan-jiang Mar 19, 2020

Choose a reason for hiding this comment

xinan-jiang commented Mar 19, 2020

xinan-jiang commented Mar 19, 2020

sherhut commented Mar 19, 2020