Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR] HLO to LHLO conversion and fusion #41424

Open
MaheshRavishankar opened this issue Jul 15, 2020 · 5 comments
Open

[MLIR] HLO to LHLO conversion and fusion #41424

MaheshRavishankar opened this issue Jul 15, 2020 · 5 comments
Assignees
Labels
type:others issues not falling in bug, perfromance, support, build and install or feature

Comments

@MaheshRavishankar
Copy link
Contributor

I see that recently there has been lot of work from @bondhugula for HLO to LHLO conversion. I am wondering what the plan here is w.r.t fusion. Is the fusion being done in HLO or LHLO? I remember a previous PR added fusion to XLA-HLO (using OpInterfaces), is that being used? Is the plan to do fusion in LHLO, which operates on memrefs?

The reason I ask is that in IREE we have been relying heavily on fusion at tensor level rather than fusion at buffer level. For elementwise operations this has worked fairly well so far. These two passes implement the fusion that covers a large number of elementwise-operations and their fusion

  1. The conversion from HLO to Linalg on tensors here
  2. Fusion of linalg operations on tensors here

The latter covers fusion of generic ops, indexed generic ops and tensor reshape ops with each other. Fusion at tensor level is much easier to do, and the fusion is effectively doing a fixed-point iteration to do maximal fusion. (There might be cases where maximal fusion can be detrimental, but I haven't encountered such a case when element-wise operations are involved). I wanted to see if it would benefit the overall codegen in Tensorflow by using these passes. They aren't complete and might need enhancements to cover additional cases we haven't encountered in IREE, but so far for fusion elementwise operations, these two passes together have worked very well for us. I am trying to see if we can align codegen strategies between IREE and TF so that we can build more shared components.

@stellaraccident , @sherhut , @joker-eph , @antiagainst , @nicolasvasilache for comments/visiblity.

@MaheshRavishankar MaheshRavishankar added the type:others issues not falling in bug, perfromance, support, build and install or feature label Jul 15, 2020
@joker-eph
Copy link
Contributor

Does it have to be either/or or can we have both? Maybe with a different focus at different level/layers of the stack?
(at the moment I believe the only fusion pass in the MLIR-HLO land is operating on tensor)

@amahendrakar amahendrakar assigned ymodak and unassigned amahendrakar Jul 16, 2020
@ymodak ymodak assigned joker-eph and unassigned ymodak Jul 16, 2020
@sherhut
Copy link

sherhut commented Jul 17, 2020

We also have an implementation for fusion in Linalg that works at the buffer level (in the same file that was linked, called fuseLinalgOpsGreedily). The approach there is different and I agree with @MaheshRavishankar that fusing at the tensor level often is simpler as one does not need to trace reads and writes to buffers (which is implemented as linalg::LinalgDependenceGraph). @pifon2a experimented with using tensor based fusion in the XLA emitter pipeline and that worked well. The main obstacle back then was the lack of bufferization support for Linalg, which has now landed in MLIR core, so we could revisit it. It is not a high priority right now, though.

One difference we have in XLA is that we have a fusion pass that pre-computes what operations we want to fuse, so we do not use a fix point during the actual fusion on Linalg itself. I believe that is also what the aforementioned interfaces are used for. This is modeled with a FusionOp node and we just fuse everything starting from the result of that node.

I think we already share the core components (Linalg fusion, which is in core, HLO to Linalg, which is in the TensorFlow) and we should expand on building these out. There are some LinAlg to HLO patterns for example that would be great to move from IREE to TensorFlow.

@MaheshRavishankar
Copy link
Contributor Author

Thanks @joker-eph , @sherhut for the replies

Does it have to be either/or or can we have both? Maybe with a different focus at different level/layers of the stack?
(at the moment I believe the only fusion pass in the MLIR-HLO land is operating on tensor)

We will need some fusion at buffer level for sure. Operations that have reduction semantics cannot be represented as well in linalg on tensors (its an on-going disucssion). So fusing things like matmul/conv/reduce with other ops will probably use linalg on buffers or vector dialect.

The main obstacle back then was the lack of bufferization support for Linalg, which has now landed in MLIR core, so we could revisit it. It is not a high priority right now, though.

FWIW, the conversion from HLO to Linalg and fusion parts like in TF and MLIR respectively, and just invoking two passes should give you that. Then the missing piece as you mention is the bufferization. But if I understand correctly its similar to the one being used (or being considered) for buffer allocation from HLO -> LHLO. So then using the fusion on tensors pass in Linalg should hopefully not take a lot of work. If there are any issues, we can address them cause we are using these and have an incentive to fix them.

One difference we have in XLA is that we have a fusion pass that pre-computes what operations we want to fuse, so we do not use a fix point during the actual fusion on Linalg itself. I believe that is also what the aforementioned interfaces are used for. This is modeled with a FusionOp node and we just fuse everything starting from the result of that node.

Not sure how using a fixed point doesnt achieve what you want. In IREE too, the "things to be fused" are decided before hand. Using the fusion on tensors should give you the same thing.

I think we already share the core components (Linalg fusion, which is in core, HLO to Linalg, which is in the TensorFlow) and we should expand on building these out. There are some LinAlg to HLO patterns for example that would be great to move from IREE to TensorFlow.

This has been a source of confusion for a long time. The only ones we have in IREE go from HLO -> Linalg on buffers. The reason they are in IREE is cause AFAICS, the TF codegen strategy is to go from HLO -> LHLO -> Linalg. That diverges a bit. IREE uses HLO as an "input" IR for now. All other dialects used in IREE either live in IREE or in MLIR. Is there something specific you are looking at that you think we could port from IREE to TF which is outside of the HLO -> Linalg on buffers conversion.

@joker-eph
Copy link
Contributor

joker-eph commented Jul 17, 2020 via email

@sherhut
Copy link

sherhut commented Jul 23, 2020

The main obstacle back then was the lack of bufferization support for Linalg, which has now landed in MLIR core, so we could revisit it. It is not a high priority right now, though.

FWIW, the conversion from HLO to Linalg and fusion parts like in TF and MLIR respectively, and just invoking two passes should give you that. Then the missing piece as you mention is the bufferization. But if I understand correctly its similar to the one being used (or being considered) for buffer allocation from HLO -> LHLO. So then using the fusion on tensors pass in Linalg should hopefully not take a lot of work. If there are any issues, we can address them cause we are using these and have an incentive to fix them.

I meat to say that it was an obstacle and the generic buffer allocation support in MLIR could handle it now. It simply was not a priority to get done but that might change once this code gets more use again.

One difference we have in XLA is that we have a fusion pass that pre-computes what operations we want to fuse, so we do not use a fix point during the actual fusion on Linalg itself. I believe that is also what the aforementioned interfaces are used for. This is modeled with a FusionOp node and we just fuse everything starting from the result of that node.

Not sure how using a fixed point doesnt achieve what you want. In IREE too, the "things to be fused" are decided before hand. Using the fusion on tensors should give you the same thing.

I am not saying that a fix point does not achieve it. I just tried to explain what the fusion pass that landed recently is about and why these interfaces were added. That is independent of whether we fuse on tensors or buffers and certainly not an argument against fusion on tensors. In fact, I am not arguing against it at all, just providing context.

I think we already share the core components (Linalg fusion, which is in core, HLO to Linalg, which is in the TensorFlow) and we should expand on building these out. There are some LinAlg to HLO patterns for example that would be great to move from IREE to TensorFlow.

This has been a source of confusion for a long time. The only ones we have in IREE go from HLO -> Linalg on buffers. The reason they are in IREE is cause AFAICS, the TF codegen strategy is to go from HLO -> LHLO -> Linalg. That diverges a bit. IREE uses HLO as an "input" IR for now. All other dialects used in IREE either live in IREE or in MLIR. Is there something specific you are looking at that you think we could port from IREE to TF which is outside of the HLO -> Linalg on buffers conversion.

I wondered about these patterns. They go from HLO to LinAlg and could hence be placed in TF/HLO repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:others issues not falling in bug, perfromance, support, build and install or feature
Projects
None yet
Development

No branches or pull requests

5 participants