Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR][XLA] Fix ops erase bug in DeadTempBufferRemoval #37687

Conversation

xinan-jiang
Copy link
Contributor

This is a PR from JIZHI, the AI platform in Tencent.
@sherhut @River707

I think it will break the DAG struct that recursiveErase called directly when function.walk
The operation which needs erasing should be put in a vector when function.walk firstly and erased after function.walk

@tensorflow-bot tensorflow-bot bot added the size:S CL Change Size: Small label Mar 18, 2020
@gbaned gbaned self-assigned this Mar 18, 2020
@gbaned gbaned added the comp:xla XLA label Mar 18, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Mar 18, 2020
@gbaned gbaned requested a review from sherhut March 18, 2020 12:12
Copy link

@sherhut sherhut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably safer to do it this way, thanks. Have you seen this fail for some real example? Just curious.

The reason this works currently is that we only remove the current operation and operations that are later, i.e., not the next operation. So we never delete the operation the walk is pointing to (which is the next one).

Still worth doing. Could you address the one comment and re-upload?

// TODO(herhut): There should be a generic helper for this.
recursiveErase(op);
}
opsToErase.clear();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really needed, as opsToErase is going out of scope here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Mar 18, 2020
@xinan-jiang xinan-jiang force-pushed the pr/mlir/fix-dead-temp-buffer-removal branch from 882c731 to 9a7ad6f Compare March 19, 2020 07:47
@xinan-jiang
Copy link
Contributor Author

below is a real example

*** IR Dump Before xla::mlir_gpu::{anonymous}::DeadTempBufferRemoval ***
func @fusion.96(%arg0: memref<33708x1024xf32>, %arg1: memref<33708x1024xf32>) {
%c0 = constant 0 : index
%c1 = constant 1 : index
%cst = constant 3.125000e-02 : f32
%c33708 = constant 33708 : index
%c1024 = constant 1024 : index
%0 = alloc() {temp = true} : memref<33708x1024xf32>
%1 = alloc() {temp = true} : memref
store %cst, %1[] : memref
loop.for %arg2 = %c0 to %c33708 step %c1 {
loop.for %arg3 = %c0 to %c1024 step %c1 {
%2 = subview %arg0[%arg2, %arg3] [] [] : memref<33708x1024xf32> to memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%3 = subview %0[%arg2, %arg3] [] [] : memref<33708x1024xf32> to memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%4 = subview %arg1[%arg2, %arg3] [] [] : memref<33708x1024xf32> to memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%5 = load %3[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%cst_0 = constant 3.125000e-02 : f32
%6 = load %1[] : memref
store %cst_0, %3[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%7 = load %2[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%8 = load %4[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
%9 = mulf %7, %cst_0 : f32
store %9, %4[%c0, %c0] : memref<1x1xf32, affine_map<(d0, d1)[s0] -> (d0 * 1024 + s0 + d1)>>
}
}
dealloc %1 : memref
dealloc %0 : memref<33708x1024xf32>
return
}

Fatal Python error: Segmentation fault

@xinan-jiang
Copy link
Contributor Author

Another question, how to add UT for the passes?

@sherhut
Copy link

sherhut commented Mar 19, 2020

Thanks for the example. It has a store directly after the alloc, which makes this fail. Thanks for catching this.

Another question, how to add UT for the passes?

Currently these cannot be tested, as they are not visible at all. They all have TODOs to turn them into real implementations as opposed to the prototype state they are currently in. A first step would be to move them into a separate file, adding a PassRegistration so that they become visible and then link them into a mlir-opt like tool. Unfortunately, none of this currently exists. We do not even have a directory structure for it. Let me see whether we can quickly add this.

In the meantime, this should still land.

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Mar 19, 2020
@tensorflow-bot tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 19, 2020
@rthadur rthadur removed kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 23, 2020
@tensorflow-bot tensorflow-bot bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 23, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 23, 2020
@tensorflow-copybara tensorflow-copybara merged commit 4286b65 into tensorflow:master Mar 23, 2020
PR Queue automation moved this from Approved by Reviewer to Merged Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:xla XLA ready to pull PR ready for merge process size:S CL Change Size: Small
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

7 participants