[DISCUSS] Relax minimum build pipeline #49

YuchenJin · 2021-11-24T01:58:05Z

Key goals

In relax, we want to have a unified and minimum build API that maps IRModule → runtime.Module. The tvm.relax.build(mod: IRmodule) can build any IRModule no matter what transformations have been applied to the input IRModule. This minimum build will enable flexible and customizable compilation pipelines without the need to hack into the core of the compiler, and allow us to explore new space. We propose the following interface, and would like to hear your ideas!

Interface

tvm.relax.build(mod: IRModule,
		target: Target) -> runtime.Module

The build API accepts two inputs: an input IRModule(mixed Relax/TIR functions) to build, the target to be built for.
The minimum build pipeline should include passes ToNonDataflow, CallDPSRewrite, VMMemoryLower, and VMShapeLower

How to implement

The estimated amount of work(~ a few hundred loc)

Since currently we have Relax VM as the only executor (in the future we might add Graph executor and AOT executor), we need to write a VMExecutorFactory as a wrapper class to create VM executor.
If the input IRModule does not contain relax function, the output runtime.Module will only contain compiled TIR primfuncs.
VM now takes executable and mod: vm = relax.VirtualMachine(ex, tvm.cpu(), mod=lib), we need to make it to take VMExecutorFactory instead.

The following code snippet shows the build API and how to create an executor and run a relax program:

# Naming convention in relax
# mod: IRModule
# rt_mod: runtime.Module
    
rt_mod: runtime.Module = tvm.build(mod, target)
# We still keep the following API, open for discussion:
vm = relax.VirtualMachine(rt_mod, tvm.cpu())
# new API
vm = rt_mod["create_executor"](tvm.cpu())

The text was updated successfully, but these errors were encountered:

ZihengJiang · 2021-11-24T02:34:30Z

Some comments:

The build interface should also include target_host as an argument;
The minimum build pipeline should include ToNonDataflow pass also;
If create_executor refers to returning a vm executor, what name should we use to create graph_executor and aot_executor?

YuchenJin · 2021-11-24T02:44:37Z

Thanks @ZihengJiang!

The Target API can accept both target and target_host now, for example we can write code like target = tvm.target.Target("llvm", host="llvm").
Good point, we should also include ToNonDataflow. I updated the proposal.
I think the executor is determined during the compilation, for example here, so we can call create_executor directly in the runtime.Module.

ZihengJiang · 2021-11-24T02:52:32Z

The third point sounds strange to me. It looks like that the executor type is decided by target and target_host, which is interesting...

Another question is why we choose to use rt_mod["create_executor"] such syntax instead of create_executor(rt_mod)?

I don't have further comment besides of this. I would suggest to put this doc in the last section of https://github.com/tlc-pack/relax/wiki/Relax-Compilation-MVP-Design, since it is most about the building interface.

YuchenJin · 2021-11-24T05:11:36Z

"Another question is why we choose to use rt_mod["create_executor"] such syntax instead of create_executor(rt_mod)?"

Good question and happy to discuss it! First of all, I think it's good to have a unified API to create executor and run the executor. Since runtime.Module is a collection of functions, and user can invoke a packed function by rt_mod["func_name"](input), so it might be an easy calling convention for users to remember instead of the need to remember another set of API relax.create_executor(rt_mod). Since the VM executor is also a runtime.Module, user can run a resnet model by vm["resnet50"](input), which is consistent with the calling convention as creating the executor by rt_mod["create_executor"](device).

And this syntax is the same as the current syntax to create a graph executor with the factory class, except I think using create_executor instead of default is clearer.

tqchen · 2021-11-24T13:52:13Z

If we are going to reuse relax compilaton MVP, let us rename that to relax minimum build to include the broader vision.

sunggg · 2021-12-02T15:27:47Z

Hi, @YuchenJin. Thank you for the great proposal!
I have a few questions.

In your discussion w/ @ZihengJiang, your example seems to determine executor based on target and target_host during the compilation. Is there any specific reason behind this design? I think users may want different execution mode for the same target and target_host, so it may be more natural if we take those information from users.
[DISCUSS] Relax minimum build pipeline #49 (comment)
Please correct me if I'm wrong. I'm assuming minimum build would not include any optimization pass. If this is true, are we going to have separate discussion for optimization pass management? Unlike conventional compilers, which include optimizations within build and conduct build in the progressive lowering fashion, we want to open up more freedom on optimization passes (e.g., allow feedback/profiling-guided search). I think it is worth thinking about what would be possible/impossible in such optimization pass design and how to plug-in new pass into existing passes.

YuchenJin · 2021-12-07T02:32:14Z

Thanks @sunggg, these are great questions!

Ideally we want to treat the target(compiler/executor) as first-class citizen in Relax (an idea @junrushao1994 proposed). Users can specify the target through function attributes as the following:

class MyModule:
  @R.func
  def relax_func(...):
    # specify compiler/executor with function attributes
    R.func_attrs(
      "target": {
        "kind": "cuda",
        ...
        "compiler": "vm",
        "executor": "vm-cudagraph",
      }
    )
    ...

  @R.func
  def trt_func(...):
    R.func_attrs(
      "target": {
        "kind": "cuda",
        ...
        "compiler": "tensorrt",
        "executor": "tensorrt",
      }
    )

In this case, target can be optional in the build API. Since we are developing Relax step-by-step from manual to automated, we can make the build API take a target parameter for now, and make it optional after we discuss and finish the first-class compiler/executor design and the implementation.

Right, the minimum build would not include any optimization pass, and we can have separate discussions for optimization pass management and compilation flow customization. I know you have some good insights on these topics after developing Collage, feel free to open discussion threads and talk about them. :)

YuchenJin mentioned this issue Nov 24, 2021

[DISCUSS] Multi-backend Dispatching in Relax #46

Open

2 tasks

sunggg mentioned this issue Jan 21, 2022

[DISCUSS] Relax Pass Infrastructure #71

Open

tqchen closed this as completed Dec 1, 2022

jinhongyii pushed a commit to jinhongyii/relax that referenced this issue Dec 16, 2022

[TESTS] Fix cutlass related unittests (tlc-pack#49)

b8d9f0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Relax minimum build pipeline #49

[DISCUSS] Relax minimum build pipeline #49

YuchenJin commented Nov 24, 2021 •

edited

Loading

ZihengJiang commented Nov 24, 2021

YuchenJin commented Nov 24, 2021

ZihengJiang commented Nov 24, 2021 •

edited

Loading

YuchenJin commented Nov 24, 2021

tqchen commented Nov 24, 2021

sunggg commented Dec 2, 2021

YuchenJin commented Dec 7, 2021

[DISCUSS] Relax minimum build pipeline #49

[DISCUSS] Relax minimum build pipeline #49

Comments

YuchenJin commented Nov 24, 2021 • edited Loading

Key goals

Interface

How to implement

ZihengJiang commented Nov 24, 2021

YuchenJin commented Nov 24, 2021

ZihengJiang commented Nov 24, 2021 • edited Loading

YuchenJin commented Nov 24, 2021

tqchen commented Nov 24, 2021

sunggg commented Dec 2, 2021

YuchenJin commented Dec 7, 2021

YuchenJin commented Nov 24, 2021 •

edited

Loading

ZihengJiang commented Nov 24, 2021 •

edited

Loading