Skip to content
This repository has been archived by the owner on May 22, 2023. It is now read-only.

[DISCUSS] Relax minimum build pipeline #49

Closed
YuchenJin opened this issue Nov 24, 2021 · 7 comments
Closed

[DISCUSS] Relax minimum build pipeline #49

YuchenJin opened this issue Nov 24, 2021 · 7 comments

Comments

@YuchenJin
Copy link
Collaborator

YuchenJin commented Nov 24, 2021

Key goals

In relax, we want to have a unified and minimum build API that maps IRModuleruntime.Module. The tvm.relax.build(mod: IRmodule) can build any IRModule no matter what transformations have been applied to the input IRModule. This minimum build will enable flexible and customizable compilation pipelines without the need to hack into the core of the compiler, and allow us to explore new space. We propose the following interface, and would like to hear your ideas!

Interface

tvm.relax.build(mod: IRModule,
		target: Target) -> runtime.Module

The build API accepts two inputs: an input IRModule(mixed Relax/TIR functions) to build, the target to be built for.
The minimum build pipeline should include passes ToNonDataflow, CallDPSRewrite, VMMemoryLower, and VMShapeLower

How to implement

The estimated amount of work(~ a few hundred loc)

  • Since currently we have Relax VM as the only executor (in the future we might add Graph executor and AOT executor), we need to write a VMExecutorFactory as a wrapper class to create VM executor.
  • If the input IRModule does not contain relax function, the output runtime.Module will only contain compiled TIR primfuncs.
  • VM now takes executable and mod: vm = relax.VirtualMachine(ex, tvm.cpu(), mod=lib), we need to make it to take VMExecutorFactory instead.

The following code snippet shows the build API and how to create an executor and run a relax program:

# Naming convention in relax
# mod: IRModule
# rt_mod: runtime.Module
    
rt_mod: runtime.Module = tvm.build(mod, target)
# We still keep the following API, open for discussion:
vm = relax.VirtualMachine(rt_mod, tvm.cpu())
# new API
vm = rt_mod["create_executor"](tvm.cpu())
@ZihengJiang
Copy link
Contributor

Some comments:

  • The build interface should also include target_host as an argument;
  • The minimum build pipeline should include ToNonDataflow pass also;
  • If create_executor refers to returning a vm executor, what name should we use to create graph_executor and aot_executor?

@YuchenJin
Copy link
Collaborator Author

Thanks @ZihengJiang!

  • The Target API can accept both target and target_host now, for example we can write code like target = tvm.target.Target("llvm", host="llvm").
  • Good point, we should also include ToNonDataflow. I updated the proposal.
  • I think the executor is determined during the compilation, for example here, so we can call create_executor directly in the runtime.Module.

@ZihengJiang
Copy link
Contributor

ZihengJiang commented Nov 24, 2021

The third point sounds strange to me. It looks like that the executor type is decided by target and target_host, which is interesting...

Another question is why we choose to use rt_mod["create_executor"] such syntax instead of create_executor(rt_mod)?

I don't have further comment besides of this. I would suggest to put this doc in the last section of https://github.com/tlc-pack/relax/wiki/Relax-Compilation-MVP-Design, since it is most about the building interface.

@YuchenJin
Copy link
Collaborator Author

"Another question is why we choose to use rt_mod["create_executor"] such syntax instead of create_executor(rt_mod)?"

Good question and happy to discuss it! First of all, I think it's good to have a unified API to create executor and run the executor. Since runtime.Module is a collection of functions, and user can invoke a packed function by rt_mod["func_name"](input), so it might be an easy calling convention for users to remember instead of the need to remember another set of API relax.create_executor(rt_mod). Since the VM executor is also a runtime.Module, user can run a resnet model by vm["resnet50"](input), which is consistent with the calling convention as creating the executor by rt_mod["create_executor"](device).

And this syntax is the same as the current syntax to create a graph executor with the factory class, except I think using create_executor instead of default is clearer.

@tqchen
Copy link
Contributor

tqchen commented Nov 24, 2021

If we are going to reuse relax compilaton MVP, let us rename that to relax minimum build to include the broader vision.

@sunggg
Copy link
Collaborator

sunggg commented Dec 2, 2021

Hi, @YuchenJin. Thank you for the great proposal!
I have a few questions.

  • In your discussion w/ @ZihengJiang, your example seems to determine executor based on target and target_host during the compilation. Is there any specific reason behind this design? I think users may want different execution mode for the same target and target_host, so it may be more natural if we take those information from users.
    [DISCUSS] Relax minimum build pipeline  #49 (comment)
  • Please correct me if I'm wrong. I'm assuming minimum build would not include any optimization pass. If this is true, are we going to have separate discussion for optimization pass management? Unlike conventional compilers, which include optimizations within build and conduct build in the progressive lowering fashion, we want to open up more freedom on optimization passes (e.g., allow feedback/profiling-guided search). I think it is worth thinking about what would be possible/impossible in such optimization pass design and how to plug-in new pass into existing passes.

@YuchenJin
Copy link
Collaborator Author

Thanks @sunggg, these are great questions!

  • Ideally we want to treat the target(compiler/executor) as first-class citizen in Relax (an idea @junrushao1994 proposed). Users can specify the target through function attributes as the following:
class MyModule:
  @R.func
  def relax_func(...):
    # specify compiler/executor with function attributes
    R.func_attrs(
      "target": {
        "kind": "cuda",
        ...
        "compiler": "vm",
        "executor": "vm-cudagraph",
      }
    )
    ...

  @R.func
  def trt_func(...):
    R.func_attrs(
      "target": {
        "kind": "cuda",
        ...
        "compiler": "tensorrt",
        "executor": "tensorrt",
      }
    )

In this case, target can be optional in the build API. Since we are developing Relax step-by-step from manual to automated, we can make the build API take a target parameter for now, and make it optional after we discuss and finish the first-class compiler/executor design and the implementation.

  • Right, the minimum build would not include any optimization pass, and we can have separate discussions for optimization pass management and compilation flow customization. I know you have some good insights on these topics after developing Collage, feel free to open discussion threads and talk about them. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants