proposal(ir): state based implementation #972

kemingy · 2022-10-07T10:03:37Z

Signed-off-by: Keming kemingyang@tensorchord.ai

related to feasibility-research(lang): Refactor frontend language #91

Signed-off-by: Keming <kemingyang@tensorchord.ai>

muniu-bot · 2022-10-07T10:03:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kemingy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kemingy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

VoVAllen · 2022-10-07T10:15:25Z

How to merge different different stages like llb.Merge?

Also I think bramble's syntax can be an option

kemingy · 2022-10-07T10:19:35Z

How to merge different different stages like llb.Merge?

Will add Merge, Diff, File later.

Also I think bramble's syntax can be an option

Will take a look.

VoVAllen · 2022-10-07T11:32:13Z

Some examples from bramble https://github.com/maxmcd/bramble/blob/eea4aee51e6ad881166412d61190012fb0d97c56/internal/project/testdata/project/default.bramble

VoVAllen · 2022-10-09T04:29:08Z

I think we should have an idea about what the llb graph will look like after more dependency is set by the user (such as gcc and pypi packages), that can also utilize caches as much as possible.

Simple things should be simple, complex things should be possible.

gaocegege · 2022-10-10T08:01:34Z

I think bramble's example there looks complex.

https://github.com/maxmcd/bramble/blob/eea4aee51e6ad881166412d61190012fb0d97c56/internal/project/testdata/project/default.bramble

It declares the input arguments explicitly. Personally, I prefer the func chain.

kemingy · 2022-10-10T08:17:50Z

I think we should have an idea about what the llb graph will look like after more dependency is set by the user (such as gcc and pypi packages), that can also utilize caches as much as possible.

Simple things should be simple, complex things should be possible.

Agree. What we have now should be simple. Others like parallelism should be possible.

Method chaining should be enough for sequence commands. Each function should return a state. (ExecState should be hidden by introducing more parameters)

Signed-off-by: Keming <kemingyang@tensorchord.ai>

gaocegege · 2022-10-10T12:37:14Z

Some questions:

Is it possible to auto-merge the two chains? Merge/diff should be advanced statements.
How to integrate it with the envdlib?

Signed-off-by: Keming <kemingyang@tensorchord.ai>

kemingy · 2022-10-11T08:20:50Z

* Is it possible to auto-merge the two chains? Merge/diff should be advanced statements.

I think it's hard. (correct me if I'm wrong)

Starlark doesn't support operator overloading like conda_state + vscode_state. But I think we can introduce a new method like conda_state.merge([vscode_state]) if it's helpful.

Besides, we need explicitly use root.state() to get a copy if another branch is not built from scratch. Otherwise, we don't know when to diverge.

* How to integrate it with the envdlib?

Some ideas:

envdlib can provide functions built from scratch or from a user-provided state.
- from scratch: root.merge([envdlib.compile_rust_serving()])
- from a state: root = envdlib.tensorboard(conda_state) or root.apply(envdlib.tensorboard, host_port=9000) so users can continue chaining
We should provide Source.envd_python() which is equivalent to base(os="ubuntu20.04", language="python3").

docs/proposals/20221007-ir-state.md

gaocegege · 2022-10-12T05:07:53Z

Then could the new language syntax be compatible with the existing design?

Or is it a total breaking change?

BTW, could you please provide the example for python-basic with the new design?

Signed-off-by: Keming <kemingyang@tensorchord.ai>

kemingy · 2022-10-12T05:26:53Z

Then could the new language syntax be compatible with the existing design?

Or is it a total breaking change?

I think it will be a breaking change.

BTW, could you please provide the example for python-basic with the new design?

Already been added to the proposal. PTAL.

gaocegege · 2022-10-12T07:27:19Z

@VoVAllen WDYT

I have no opinion on it, let's start researching if starlark supports it.

VoVAllen · 2022-10-12T08:00:48Z

I'm a bit concerned about the current proposal. The current design is detail-oriented, which is more complicated than original design.
Also current design looks similar to llb, we can also consider expose llb-like primitives directly. Explicit dependency declaration is an advanced function, llb primitives would be easier for us to maintain and ensures that "complex thing is possible"

Some personal thoughts:
Explicitly define two/three stages. base stages, envd-managed stages(install.python_packages etc.), user-managed stages(run(XXX))

The difference between them is:

base stages can be overwritten by custom images, and managed by envd if not specified
envd-managed stages will parallelize and use cache as much as possible to accelerate the process, thus no dependency can be set here.
user-managed stages can be fully customized, with explicit dependency.

Other ideas:

All functions provided by envd can add a new argument, such as called state.

In user stages user can do:

state = stage('user')
state1 = install.apt_packages(["g++"], state=state)
new_state = install.python_packages(["package_needs_g++"], state=state1)

and to define it as a custom function:

def install_inhouse_package():
  state = stage('user')
  state1 = install.apt_packages(["g++"], state=state)
  new_state = install.python_packages(["package_needs_g++"], state=state1)
  # envd_output is an builtin variable, add means merge state with the final output
  envd_output.add(new_state)
  return new_state

To use

def build():
   install.python_packages(['torch'])
   install_inhouse_packages()

WDYT

kemingy · 2022-10-12T08:59:42Z

I'm a bit concerned about the current proposal. The current design is detail-oriented, which is more complicated than original design. Also current design looks similar to llb, we can also consider expose llb-like primitives directly. Explicit dependency declaration is an advanced function, llb primitives would be easier for us to maintain and ensures that "complex thing is possible"

Some personal thoughts: Explicitly define two/three stages. base stages, envd-managed stages(install.python_packages etc.), user-managed stages(run(XXX))

The difference between them is:
* base stages can be overwritten by custom images, and managed by envd if not specified

* envd-managed stages will parallelize and use cache as much as possible to accelerate the process, thus no dependency can be set here.

* user-managed stages can be fully customized, with explicit dependency.

This is similar to the current implementation and this proposal. We do have different stages, it's just not explicit.

We can provide the install.conda_python() function. So users who start with the custom images can use it to install the python environment.

Other ideas:

All functions provided by envd can add a new argument, such as called state.

In user stages user can do:

state = stage('user')
state1 = install.apt_packages(["g++"], state=state)
new_state = install.python_packages(["package_needs_g++"], state=state1)

and to define it as a custom function:

def install_inhouse_package():
  state = stage('user')
  state1 = install.apt_packages(["g++"], state=state)
  new_state = install.python_packages(["package_needs_g++"], state=state1)
  # envd_output is an builtin variable, add means merge state with the final output
  envd_output.add(new_state)
  return new_state

To use

def build():
   install.python_packages(['torch'])
   install_inhouse_packages()

WDYT

Defining the dependencies with an extra state argument is acceptable but not very user-friendly.

The LLB-like syntax is only complex when you need to use diff and merge. Otherwise, the method chaining should be a simple solution.

Signed-off-by: Keming <kemingyang@tensorchord.ai>

kemingy · 2022-10-13T00:50:44Z

One more thing, this is incompatible with config.envd.

proposal(ir): state based implementation

1148229

Signed-off-by: Keming <kemingyang@tensorchord.ai>

muniu-bot bot added the do-not-merge/work-in-progress label Oct 7, 2022

muniu-bot bot requested review from aseaday, gaocegege and terrytangyuan October 7, 2022 10:03

muniu-bot bot added the approved label Oct 7, 2022

Merge remote-tracking branch 'upstream/main' into proposal_ir

6c46277

Merge branch 'main' into proposal_ir

90241b1

update example

5207862

Signed-off-by: Keming <kemingyang@tensorchord.ai>

kemingy marked this pull request as ready for review October 10, 2022 10:17

kemingy requested a review from VoVAllen as a code owner October 10, 2022 10:17

muniu-bot bot removed the do-not-merge/work-in-progress label Oct 10, 2022

move merge to Source

4e09b6e

Signed-off-by: Keming <kemingyang@tensorchord.ai>

gaocegege reviewed Oct 12, 2022

View reviewed changes

docs/proposals/20221007-ir-state.md Outdated Show resolved Hide resolved

kemingy added 2 commits October 12, 2022 13:13

rm Source

d43bc3e

Signed-off-by: Keming <kemingyang@tensorchord.ai>

add example

b17c54a

Signed-off-by: Keming <kemingyang@tensorchord.ai>

split base and python env

09edfbe

Signed-off-by: Keming <kemingyang@tensorchord.ai>

kemingy mentioned this pull request Nov 1, 2022

feat: Better envdlib syntax support #1132

Open

kemingy mentioned this pull request Dec 2, 2022

discussion: How to maintain different frontend language versions #1249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal(ir): state based implementation #972

proposal(ir): state based implementation #972

kemingy commented Oct 7, 2022

muniu-bot bot commented Oct 7, 2022

VoVAllen commented Oct 7, 2022

kemingy commented Oct 7, 2022

VoVAllen commented Oct 7, 2022

VoVAllen commented Oct 9, 2022

gaocegege commented Oct 10, 2022

kemingy commented Oct 10, 2022

gaocegege commented Oct 10, 2022

kemingy commented Oct 11, 2022 •

edited

gaocegege commented Oct 12, 2022

kemingy commented Oct 12, 2022

gaocegege commented Oct 12, 2022

VoVAllen commented Oct 12, 2022 •

edited

kemingy commented Oct 12, 2022

kemingy commented Oct 13, 2022

proposal(ir): state based implementation #972

Are you sure you want to change the base?

proposal(ir): state based implementation #972

Conversation

kemingy commented Oct 7, 2022

muniu-bot bot commented Oct 7, 2022

VoVAllen commented Oct 7, 2022

kemingy commented Oct 7, 2022

VoVAllen commented Oct 7, 2022

VoVAllen commented Oct 9, 2022

gaocegege commented Oct 10, 2022

kemingy commented Oct 10, 2022

gaocegege commented Oct 10, 2022

kemingy commented Oct 11, 2022 • edited

gaocegege commented Oct 12, 2022

kemingy commented Oct 12, 2022

gaocegege commented Oct 12, 2022

VoVAllen commented Oct 12, 2022 • edited

kemingy commented Oct 12, 2022

kemingy commented Oct 13, 2022

kemingy commented Oct 11, 2022 •

edited

VoVAllen commented Oct 12, 2022 •

edited