Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AOT] Serialize compute graph in python and load in C++ runtime #4786

Open
ailzhang opened this issue Apr 14, 2022 · 4 comments
Open

[AOT] Serialize compute graph in python and load in C++ runtime #4786

ailzhang opened this issue Apr 14, 2022 · 4 comments
Assignees
Labels
feature request Suggest an idea on this project

Comments

@ailzhang
Copy link
Contributor

Copied from #4615 :P : This would significantly reduce the efforts to porting a demo written in Taichi to non-Python environments. In essence, the AOT module will save not only the Taichi kernels, but also the host logic invoking them.
[Will add more context as we start adding this feature]

@ailzhang ailzhang added the feature request Suggest an idea on this project label Apr 14, 2022
@ailzhang ailzhang self-assigned this Apr 14, 2022
@k-ye k-ye added this to the Taichi v1.1.0 milestone Apr 14, 2022
@k-ye k-ye moved this to Todo in Taichi Lang Apr 14, 2022
@ailzhang ailzhang moved this from Todo to In Progress in Taichi Lang Apr 26, 2022
@ailzhang
Copy link
Contributor Author

ailzhang commented Apr 26, 2022

Update 04/27 (working in progress):
We've converged on the a API style for the simplest case.

# One way to produce compute graph automatically.
@ti.aot.graph
def run_sim():
    kernel1()
    kernel2()

# Or more explicit
g = ti.aot.Graph()
func = g.add_func('func')
func.append(kernel, ...)

mod = ti.aot.Module()
mod.add_graph(run_sim)

## in C++
# Note that serialization format is implementation detail that we haven't covered
# But it's pretty flexible.
graph = aot_module->load_graph("...");
graph->run(host_ctx);

Note it's easy to support static control flow, but we're spending some time exploring how we're supposed to support dynamic control flow. Is it going to be in graph or out-of-graph? (concepts borrowed from "Dynamic Control Flow in Large-Scale Machine Learning" paper)

  • in-graph:encodes control-flow decisions as operations in the dataflow graph
  • out-of-graph :implements control-flow decisions in the client process, using control-flow primitives of a host language like Python.
  1. The out-of-graph approach is more like the traditional tracing approach.
  • Loops are unrolled and branch is selected based on python runtime value.
  • You cannot prevent users from using device to host sync for control flows, which might hurt performance a lot.
  1. For in-graph approach we need to decide whether the control flow support is explicit or implicit:
  • explicit: more like JAX style
  • implicit: more like TorchScript style

We're currently building a toy prototype just to get some sense of usability and coverage using the explicit in-graph approach.

@ailzhang
Copy link
Contributor Author

ailzhang commented May 16, 2022

Update 05/16 (working in progress):

We're considering adding support for static compute graph construction and its execution.

Goals:

  • The graph can be serialized, then deployed and run in an environment without python runtime.
  • It'd be super helpful if we can run the graph in python frontend as if it was deployed, to make the AOT debugging experience better.
  • Normal taichi python frontend users can opt in to use graph execution mode as well. We've noticed pretty heavy overhead from python->C++ communication especially for small kernels on a powerful GPU. Launching a graph instead of individual small kernels from python can dramatically reduce the overhead for these light kernels.

Non-goals:

  • Graph mode won't be as flexible as current python JIT (dynamic graph) as it won't allow host execution (like returning a value from a taichi kernel to python and condition on it).

Key terminologies (credits to @bobcao3 ) :

  • Each graph is composed of a series of nodes. There are currently three types of nodes: Dispatch, Sequential, and Conditional. When a graph is invoked, we evaluate the the graph by evaluating all the individual nodes sequentially. Running a node don't return any value to python.
  • Dispatch is a basic node that executes a Taichi kernel with a specific set of arguments.
  • Sequential node can be considered as a list of Nodes, where each node is evaluated sequentially.
  • Conditional: to be added.
  • You can view a graph as a container with a root sequential node and it also manages the alloc/dealloc of the nodes inside the graph.
  • Symbolic arguments: users are required to create symbolic arguments and defines the data flow when building the graph. When you invoke the graph, it's required to pass a runtime value to the corresponding arguments. Currently only scalar/vector/matrix or ndarray are supported as runtime values.

Proposed APIs and a typical workflow :

** Note these are not finalized, please feel free to comment if you have any suggestions!

Code below is truncated to make it easier to understand, please see "prototype" section for a full example.

  • Build a compute graph
g_update = ti.graph.Graph('update')
substep = g_update.create_sequential()
float2 = ti.types.vector(2, float)
sym_grid_v = ti.graph.Arg('grid_v', dtype=float2) 
... 
substep.emplace(substep_reset_grid, sym_grid_v, sym_grid_m) # TODO: consider using append instead of emplace, just a bit more checks in python. 
substep.emplace(substep_p2g, sym_x, sym_v, sym_C, sym_J, sym_grid_v,
                sym_grid_m)

for i in range(500):
    g_update.append(substep)
  • Compile
g_update.compile()
  • Execute in Python frontend
x = ti.Vector.ndarray(2, ti.f32, shape=(n_particles))
v = ti.Vector.ndarray(2, ti.f32, shape=(n_particles))

C = ti.Matrix.ndarray(2, 2, ti.f32, shape=(n_particles))
J = ti.ndarray(ti.f32, shape=(n_particles))
grid_v = ti.Vector.ndarray(2, ti.f32, shape=(n_grid, n_grid))
grid_m = ti.ndarray(ti.f32, shape=(n_grid, n_grid))
g_update.run({'x': x, 'v': v, 'C': C,  'J': J, 'grid_v': grid_v, 'grid_m': grid_m})
  • Serialize to preprare for running in C++
    mod = ti.aot.Module(ti.vulkan)
    mod.add_graph(g_update)
    mod.save('shaders', '')
  • Run in C++
    std::unique_ptr<taichi::lang::aot::Module> module = taichi::lang::aot::Module::load(taichi::Arch::vulkan, mod_params);
    auto g_update = module->load_graph("update");

    std::unordered_map<std::string, taichi::lang::aot::IValue> args; // C++ version, will change this to C API

    args.insert({"grid_v", taichi::lang::aot::IValue(devalloc_grid_v, N_GRID * N_GRID * 2 * sizeof(float), {N_GRID, N_GRID, 2})});
    args.insert({"grid_m", taichi::lang::aot::IValue(devalloc_grid_m, N_GRID * N_GRID * sizeof(float), {N_GRID, N_GRID})});

    g_update->run(args);

Preliminary result

In terms of reducing python launch overhead, we noticed 3x speedup (15fps -> 45fps) when running mpm88 example (500 substeps) on a RTX3090 after switching to graph execution mode.

Q&A:

  • How about control flows?
    Since graph execution mode is mainly designed to maximize performance, dynamic control flow (which involves host execution is temporarily out of scope). But we might add support for ti.cond(field_val, true_clause, false_clause) on supported hardwares.

  • What do you support as graph arguments?
    Currently scalars and ndarray are supported. Support for taichi field will be added after https://github.com/taichi-dev/taichi/blob/master/docs/rfcs/20220413-aot-for-all-snode.md is implemented.

Prototype:

Checkout a proof of concept implementation and a mpm88 example in C++.

@k-ye
Copy link
Member

k-ye commented May 17, 2022

Thanks for summarizing this & looks great!

substep = g_update.create_sequential()

Have we discussed whether a Sequential node is attached to a specific graph instance, or can be a general node? In the later case, we would write substep = ti.graph.create_sequential(). I feel like making it a general node is more intuitive, but there could be factors that I forgot to consider.

substep.emplace(substep_reset_grid, sym_grid_v, sym_grid_m)

I still feel like emplace is too C++. If we view the graph as declarative, I think it could be substep.call(kernel, args...).

ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 19, 2022
This PR servces as the base PR with a minimal example of building and
running a Graph. Runtime values for graph arguments can be either
scalars or ndarrays.

For detailed proposal please see taichi-dev#4786.

Things handled in this PR:
- Maximize common code/runtime shared by the two workflows below:
  1. build -> compile -> run
  2. build -> compile -> serialize -> deserilize -> run
- Graph arguments are annotated with dtype and element shape for ndarray (temporary
until we have vec3 types in C++)

Things that we've discussed but not included in this PR:
- C API: I'll leave that for a unified C API PR in the future.
- bind IValues to graph: easy, will add later.
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 19, 2022
related: taichi-dev#4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.
ailzhang pushed a commit that referenced this issue May 20, 2022
This PR servces as the base PR with a minimal example of building and
running a Graph. Runtime values for graph arguments can be either
scalars or ndarrays.

For detailed proposal please see #4786.

Things handled in this PR:
- Maximize common code/runtime shared by the two workflows below:
  1. build -> compile -> run
  2. build -> compile -> serialize -> deserilize -> run
- Graph arguments are annotated with dtype and element shape for ndarray (temporary
until we have vec3 types in C++)

Things that we've discussed but not included in this PR:
- C API: I'll leave that for a unified C API PR in the future.
- bind IValues to graph: easy, will add later.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 20, 2022
related: #4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 20, 2022
…nd run."

related: #4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 20, 2022
This PR servces as the base PR with a minimal example of building and
running a Graph. Runtime values for graph arguments can be either
scalars or ndarrays.

For detailed proposal please see #4786.

Things handled in this PR:
- Maximize common code/runtime shared by the two workflows below:
  1. build -> compile -> run
  2. build -> compile -> serialize -> deserilize -> run
- Graph arguments are annotated with dtype and element shape for ndarray (temporary
until we have vec3 types in C++)

Things that we've discussed but not included in this PR:
- C API: I'll leave that for a unified C API PR in the future.
- bind IValues to graph: easy, will add later.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 20, 2022
related: #4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 20, 2022
This PR servces as the base PR with a minimal example of building and
running a Graph. Runtime values for graph arguments can be either
scalars or ndarrays.

For detailed proposal please see taichi-dev#4786.

Things handled in this PR:
- Maximize common code/runtime shared by the two workflows below:
  1. build -> compile -> run
  2. build -> compile -> serialize -> deserilize -> run
- Graph arguments are annotated with dtype and element shape for ndarray (temporary
until we have vec3 types in C++)

Things that we've discussed but not included in this PR:
- C API: I'll leave that for a unified C API PR in the future.
- bind IValues to graph: easy, will add later.

ghstack-source-id: 43d08f04fab66f197744e254625079ac4807d7f6
Pull Request resolved: taichi-dev#5015
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 20, 2022
related: taichi-dev#4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

ghstack-source-id: dd72f446caf0fbb33491d165e64ade93d9d92978
Pull Request resolved: taichi-dev#5016
ailzhang pushed a commit that referenced this issue May 20, 2022
This PR servces as the base PR with a minimal example of building and
running a Graph. Runtime values for graph arguments can be either
scalars or ndarrays.

For detailed proposal please see #4786.

Things handled in this PR:
- Maximize common code/runtime shared by the two workflows below:
  1. build -> compile -> run
  2. build -> compile -> serialize -> deserilize -> run
- Graph arguments are annotated with dtype and element shape for ndarray (temporary
until we have vec3 types in C++)

Things that we've discussed but not included in this PR:
- C API: I'll leave that for a unified C API PR in the future.
- bind IValues to graph: easy, will add later.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 20, 2022
…nd run."

related: #4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 20, 2022
related: #4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 20, 2022
This PR servces as the base PR with a minimal example of building and
running a Graph. Runtime values for graph arguments can be either
scalars or ndarrays.

For detailed proposal please see taichi-dev#4786.

Things handled in this PR:
- Maximize common code/runtime shared by the two workflows below:
  1. build -> compile -> run
  2. build -> compile -> serialize -> deserilize -> run
- Graph arguments are annotated with dtype and element shape for ndarray (temporary
until we have vec3 types in C++)

Things that we've discussed but not included in this PR:
- C API: I'll leave that for a unified C API PR in the future.
- bind IValues to graph: easy, will add later.

ghstack-source-id: f459afccdde56b59ab0ecc860ed11d761a20fe0a
Pull Request resolved: taichi-dev#5015
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 20, 2022
related: taichi-dev#4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

ghstack-source-id: e369b326734b3cffaf91262ed84806044e121c3c
Pull Request resolved: taichi-dev#5016
ailzhang pushed a commit that referenced this issue May 21, 2022
This PR servces as the base PR with a minimal example of building and
running a Graph. Runtime values for graph arguments can be either
scalars or ndarrays.

For detailed proposal please see #4786.

Things handled in this PR:
- Maximize common code/runtime shared by the two workflows below:
  1. build -> compile -> run
  2. build -> compile -> serialize -> deserilize -> run
- Graph arguments are annotated with dtype and element shape for ndarray (temporary
until we have vec3 types in C++)

Things that we've discussed but not included in this PR:
- C API: I'll leave that for a unified C API PR in the future.
- bind IValues to graph: easy, will add later.

ghstack-source-id: f459afccdde56b59ab0ecc860ed11d761a20fe0a
Pull Request resolved: #5015
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 21, 2022
related: taichi-dev#4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

ghstack-source-id: e369b326734b3cffaf91262ed84806044e121c3c
Pull Request resolved: taichi-dev#5016
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 22, 2022
related: taichi-dev#4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

ghstack-source-id: e369b326734b3cffaf91262ed84806044e121c3c
Pull Request resolved: taichi-dev#5016
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 22, 2022
related: taichi-dev#4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

ghstack-source-id: e369b326734b3cffaf91262ed84806044e121c3c
Pull Request resolved: taichi-dev#5016
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 22, 2022
related: taichi-dev#4786

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

ghstack-source-id: e369b326734b3cffaf91262ed84806044e121c3c
Pull Request resolved: taichi-dev#5016
ailzhang pushed a commit to ailzhang/taichi that referenced this issue May 22, 2022
related: taichi-dev#4786

[Update]: based on an offline discussion with @k-ye, I've split the
original `Graph` class into `GraphBuilder` and `CompiledGraph` classes
in C++.

```
GraphBuilder
    |
 compile()
    |
    |
CompiledGraph <----  serialize/deserialize ----> file
    |
    |
   run()
```

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

ghstack-source-id: e369b326734b3cffaf91262ed84806044e121c3c
Pull Request resolved: taichi-dev#5016
ailzhang pushed a commit that referenced this issue May 23, 2022
…nd run."

related: #4786

[Update]: based on an offline discussion with k-ye, I've split the
original `Graph` class into `GraphBuilder` and `CompiledGraph` classes
in C++. Note that the implementation didn't follow exactly the builder
design pattern as our builder is slightly simpler as shown below.
The complexity in our problem is more in the need of serialization and
deserialization for the same graph representation intead of its
construction process. So IMHO it's good enough to separate the
GraphBuilder and Runner(`CompiledGraph`) as we discussed. Please feel
free to correct me if I'm wrong!

```
GraphBuilder
    |
 compile()
    |
    |
CompiledGraph <----  serialize/deserialize ----> file
    |
    |
   run()
```

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 23, 2022
related: #4786

[Update]: based on an offline discussion with k-ye, I've split the
original `Graph` class into `GraphBuilder` and `CompiledGraph` classes
in C++. Note that the implementation didn't follow exactly the builder
design pattern as our builder is slightly simpler as shown below.
The complexity in our problem is more in the need of serialization and
deserialization for the same graph representation intead of its
construction process. So IMHO it's good enough to separate the
GraphBuilder and Runner(`CompiledGraph`) as we discussed. Please feel
free to correct me if I'm wrong!

```
GraphBuilder
    |
 compile()
    |
    |
CompiledGraph <----  serialize/deserialize ----> file
    |
    |
   run()
```

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 23, 2022
…nd run."

related: #4786

[Update]: based on an offline discussion with k-ye, I've split the
original `Graph` class into `GraphBuilder` and `CompiledGraph` classes
in C++. Note that the implementation didn't follow exactly the builder
design pattern as our builder is slightly simpler as shown below.
The complexity in our problem is more in the need of serialization and
deserialization for the same graph representation intead of its
construction process. So IMHO it's good enough to separate the
GraphBuilder and Runner(`CompiledGraph`) as we discussed. Please feel
free to correct me if I'm wrong!

```
GraphBuilder
    |
 compile()
    |
    |
CompiledGraph <----  serialize/deserialize ----> file
    |
    |
   run()
```

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 23, 2022
related: #4786

[Update]: based on an offline discussion with k-ye, I've split the
original `Graph` class into `GraphBuilder` and `CompiledGraph` classes
in C++. Note that the implementation didn't follow exactly the builder
design pattern as our builder is slightly simpler as shown below.
The complexity in our problem is more in the need of serialization and
deserialization for the same graph representation intead of its
construction process. So IMHO it's good enough to separate the
GraphBuilder and Runner(`CompiledGraph`) as we discussed. Please feel
free to correct me if I'm wrong!

```
GraphBuilder
    |
 compile()
    |
    |
CompiledGraph <----  serialize/deserialize ----> file
    |
    |
   run()
```

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 23, 2022
related: #4786

[Update]: based on an offline discussion with @k-ye, I've split the
original `Graph` class into `GraphBuilder` and `CompiledGraph` classes
in C++. Note that the implementation didn't follow exactly the builder
design pattern as our builder is slightly simpler as shown below.
The complexity in our problem is more in the need of serialization and
deserialization for the same graph representation intead of its
construction process. So IMHO it's good enough to separate the
GraphBuilder and Runner(`CompiledGraph`) as we discussed. Please feel
free to correct me if I'm wrong!

```
GraphBuilder
    |
 compile()
    |
    |
CompiledGraph <----  serialize/deserialize ----> file
    |
    |
   run()
```

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 23, 2022
related: #4786

[Update]: based on an offline discussion with k-ye, I've split the
original `Graph` class into `GraphBuilder` and `CompiledGraph` classes
in C++. Note that the implementation didn't follow exactly the builder
design pattern as our builder is slightly simpler as shown below.
The complexity in our problem is more in the need of serialization and
deserialization for the same graph representation intead of its
construction process. So IMHO it's good enough to separate the
GraphBuilder and Runner(`CompiledGraph`) as we discussed. Please feel
free to correct me if I'm wrong!

```
GraphBuilder
    |
 compile()
    |
    |
CompiledGraph <----  serialize/deserialize ----> file
    |
    |
   run()
```

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

[ghstack-poisoned]
ailzhang pushed a commit that referenced this issue May 24, 2022
related: #4786

[Update]: based on an offline discussion with k-ye, I've split the
original `Graph` class into `GraphBuilder` and `CompiledGraph` classes
in C++. Note that the implementation didn't follow exactly the builder
design pattern as our builder is slightly simpler as shown below.
The complexity in our problem is more in the need of serialization and
deserialization for the same graph representation intead of its
construction process. So IMHO it's good enough to separate the
GraphBuilder and Runner(`CompiledGraph`) as we discussed. Please feel
free to correct me if I'm wrong!

```
GraphBuilder
    |
 compile()
    |
    |
CompiledGraph <----  serialize/deserialize ----> file
    |
    |
   run()
```

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

ghstack-source-id: 7dda7cc11ef3a946f31d75783a8cfd1836e47ba5
Pull Request resolved: #5024
neozhaoliang added a commit that referenced this issue Jun 14, 2022
* [Build] Switch to scikit-build as the build backend (#4624)

* switch to skbuild

* Switch the build system to scikit-build

* include bc and libmolten

* find llvm runtime bc

* fix bc files installation

* install bc after compile

* Add more message

* Auto Format

* fix findpython

* Kickstart CI

* add empty line

* add missing dependency

* fix python args

* start CI

* Fix clang tidy run

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Taichi Gardener <taichigardener@gmail.com>
Co-authored-by: Ailing <ailzhang@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [build] Install export core library to build dir (#4866)

* [misc] Bump version to v1.0.2 (#4867)

* [Bug] Remove redundant AllocStmt when lowering FrontendWhileStmt (#4870)

* [build] [bug] Fix a bug of skbuild that loses the root package_dir (#4875)

* [ci] Add libtaichi_export_core build for desktop in CI (#4871)

* [Build] [refactor] Define runtime build target (#4838)

* Move LLVM Cmake to its own dir

* Suppress warning from submodules

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use current source dir

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate Vulkan runtime files from codegen

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Doc] Add limitation about TLS optimization (#4877)

* [Doc] Add limitation about TLS optimization

* Add link to reduction sum benchmark

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Haidong Lan <turbo0628g@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [ci] Use the updated docker image for libtaichi_export_core (#4881)

* [refactor] Add ASTSerializer and use it to generate offline-cache-key (#4863)

* Add ASTSerializer, using it to generate offline-cache-key

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [build] Change the library output dir for export core (#4880)

* Change the library output dir for export core

* limit the change to the target

* [vulkan] Device API explicit semaphores (#4852)

* Device API explicit semaphores

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Destroy the semaphore before the context

* Fix type warnings

* fix nits

* return nullptr for devices that don't need semaphores

* test out no semaphores between same queue

* Use native command list instead of emulated for dx11

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove the in-queue semaphore

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use flush instead of sync in places

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix possible null semaphore

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [metal] Complete Device API (#4862)

* [metal] Complete Device API

* fix

* fix

* [Doc] Updated links that may break. (#4874)

* Updated logo

* Updated links that may break when the doc site has versions

* Added information that numpy arrays and torch tensors can be passed as arguments

* Fixed a broken link.

* [error] [lang] Improved error messages for illegal slicing or indexing to ti.field (#4873)

* [bug] Improved error messages for ilegal slicing or indexing to ti.field

* Fixed test failures

* Addressed code-review comments

* [metal] Migrate runtime's MTLBuffer allocation to unified device API (#4865)

* wip

* migrate all buffers

* [Build] [refactor] Use keywords instead of plain target_link_libraries CMake (#4864)

* Move LLVM Cmake to its own dir

* Suppress warning from submodules

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use current source dir

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate Vulkan runtime files from codegen

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use keywords instead of plain target_link_libraries

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [bug] Fixed type promotion rule for bit-shift operations (#4884)

* [bug] Fixed type promotion rule for shift operations

* removed debug info

* Addressed review comments

* [aot] [vulkan] Expose symbols for AOT (#4879)

* [aot] [vulkan] Expose symbols for AOT

* weird windows

* hide to make win happy

* fix

* [Build] [refactor] Define Cmake OpenGL runtime target (#4887)

* Move LLVM Cmake to its own dir

* Suppress warning from submodules

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use current source dir

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate Vulkan runtime files from codegen

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use keywords instead of plain target_link_libraries

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate opengl runtime files from backend

* Remove some warnings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor

* Add glfw include

* Add link to taichi core

* Update taichi/program/extension.cpp

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: yekuang <k-ye@users.noreply.github.com>

* [vulkan] Fix typo for waitSemaphoreCount (#4892)

* [vulkan] Add new VMA vulkan functions. (#4893)

* Add new VMA vulkan functions.

* fix

* Use Ninja generator on Windows and skip generator test (#4896)

* [Lang] [test] Copy-free interaction between Taichi and PaddlePaddle (#4886)

* Implement has_paddle(), to_paddle_type() and update to_taichi_type in python\taichi\lang\util.py

* Implement get_paddle_callbacks() and update get_function_body(), match_ext_arr() in python\taichi\lang\kernel_impl.py

* Add test test_io_devices() in tests\python\test_torch_io.py

* Implement callback for CPU-GPU/GPU-CPU copy between Taichi and Paddle

* Partially implement to_torch()/from_torch() according to PyTorch in Taichi

* Fix paddle.Tensor's backend check

* Update tests for from_paddle()/to_paddle()

* [doc] Update Global settings with TI_ENABLE_PADDLE

* Fix to avoid fail when only import paddle

* [test] Fix the expected list alphabetically

* [doc] Add info about paddle.Tensor

* [ci] Try to test paddle's GPU version

* Fix the usage of paddle.ones

* Fix f16 tests for paddle

* Fixed supported archs for tests of paddle

* Use 1 thread run tests for torch and paddle

* Fix linux test

* Fix windows test

* Unify the name to Paddle

* Add tests for paddle

* Replace usage of device to place for paddle

* Paddle's GPU develop package on Linux import error

* [test] Cancel tests for Paddle on GPU (#4914)

* remove debug print (#4883)

* [Doc] Updated broken links (#4912)

* [Doc] Updated broken links

* Updated links that may break.

* Added .md

* [test] Exit on error during Paddle windows test (#4910)

* [test] Exit on error during Paddle windows test

* Check if paddle test leaks memory

* Increase device memory and reduce thread number

* Revert "Check if paddle test leaks memory"

This reverts commit e0522b0e520050fb50d2c338a2a7d0b2a363bfb0.

* Disable paddle for non-paddle test

* [build] Warning Suppression PR #2: Fixed codebase warnings (#4909)

* [SIMT] Add syncwarp warp intrinsics (#4917)

* add warp_barries warp instrinsic

add warp_barrier unit test

fix error: add Args mask in warp.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [refactor] Create MatrixImpl to differentiate Taichi and Python scopes (#4853)

* wip

* wip

* wip

* wip

* wip

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup

* fix impl._subscript()

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix mesh

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix useless __init__

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix py-scope subscript

* fix swizzle

* fix doc

* fix api

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [build] Warning Suppression PR #1: Turned on -Wno-ignored-attributes & Removed unused functions (#4916)

* [SIMT] Add activemask warp intrinsics (#4918)

* add activemask warp intrinsic

add test function call

del extra space

unit-test print->assert

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [build] Warning Suppression PR #3: Eliminate warnings from third-party headers (#4920)

* [build] Warning Suppression PR #1: Turned on -Wno-ignored-attributes & Removed unused functions

* [build] Warning Suppression PR #2: Eliminate warnings from third-party headers

* Fixed an warning with enum comparison

* [build] Warning Suppression PR #4: Fixed warnings with MacOS (#4926)

* [build] Warning Suppression PR #1: Turned on -Wno-ignored-attributes & Removed unused functions

* [build] Warning Suppression PR #2: Eliminate warnings from third-party headers

* Fixed an warning with enum comparison

* [build] Warning Suppression PR #4: Fixed Mac-specific warnings

* [refactor] Simplify Matrix's initializer (#4923)

* [refactor] Simplify Matrix's initializer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update python/taichi/lang/matrix.py

* Update python/taichi/lang/matrix.py

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Doc] Updated relative path (#4929)

* Update field.md

* Updated one broken link.

* [Lang] Support sparse matrix datatype and storage format configuration (#4673)

* Add sparse matrix datatype configuration

* create sparse matrix with datatype in Python

* sparse solver takes as sparse matrix with datatype parameters

* operator overloading with bug

* fix operator overloading bugs

* Add more operator overloading functions

* EigenSparseMatrix operator overloading

* improve

* Clang-tidy

* add more datatype EigenSparseMatrix

* get/set element bug fix

* Bugfix:sparse matrix shape configuration

* improve sparse matrix test cases

* Update tests/python/test_sparse_matrix.py

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* improve

* Update taichi/program/sparse_matrix.h

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
Co-authored-by: taichiCourse01 <tgc01@taichi.graphics>

* [lang] Fix type check warnings for ti.Mesh (#4930)

* fix

* fix

* [SIMT] Add uni_sync warp intrinsics (#4927)

* [SIMT] Add uni_sync warp intrinsics

* [build] Enable -Werror on Linux & Mac (#4941)

* [build] Turn on -Werror on Linux and Mac platforms (#4928)

* [build] Turn on -Werror on Linux and Mac platforms

* Added documentations for Werror

* Patched documentation

* [doc] Updated documentations for implicit type casting rules (#4885)

* [doc] Updated documentations for type promotion rules

* Rearranged type promotion docs

* [refactor] Remove unused snode_trees in ProgramImpl interface (#4942)

* [refactor] Remove unused snode_trees in ProgramImpl interface

* Update taichi/codegen/codegen_llvm.h

* [build] Turned off -Werror temporarily for issues with performance-bot (#4946)

* [refactor] [llvm] Remove struct_compiler_ as a member variable (#4945)

* [build] Limit -Werror to Clang-compiler only (#4947)

* [build] Enable -Werror on Linux & Mac

* [build] Limit -Werror to Clang-compiler only

* [ci] Fix Nightly (#4948)

* [ci] Fix nightly test

* Add python 3.7 3.9 in nightly

* [Build] Improved building on Windows (#4925)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Lang] Add more functions to math module (#4939)

* add more functions to math module

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more functions to math module

* add more functions to math module

* add more functions to math module

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more functions to math module

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more functions to math module

* add more functions to math module

* Update _funcs.py

* Update python/taichi/_funcs.py

Co-authored-by: pengyu <6712304+FantasyVR@users.noreply.github.com>

* Update python/taichi/math/mathimpl.py

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pengyu <6712304+FantasyVR@users.noreply.github.com>

* [lang] [bug] Implement Expression serializing and fix some bugs (#4931)

* Serialize Expression and remove old useless ExpressionOfflineCacheKeyGenerator

* Fix some bugs(reported by test_assert and test-snodes with offline_cache=True)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [refactor] Add ArrayMetadata to store the array runtime size (#4950)

* [refactor] Add ArrayMetadata to store the array runtime size

* rm macros

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* revert to debug

* decompose

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [refactor] Some renamings (#4959)

* [lang] Add reference type support on real functions (#4889)

* wip

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* add test

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test_api

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [llvm] Move cache directory to dump() (#4963)

* [llvm] Move cache directory to dump()

* fix

* fix

* [RFC] AOT for all SNodes (#4806)

* [rfc] AOT for all SNodes

* add rfc tag

* fix

* fix

* Update docs/rfcs/20220413-aot-for-all-snode.md

Co-authored-by: Ailing  <ailzhang@users.noreply.github.com>

* fix

* soa

* more contents on autodiff, add_field()

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* toc

* Update docs/rfcs/20220413-aot-for-all-snode.md

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update docs/rfcs/20220413-aot-for-all-snode.md

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update docs/rfcs/20220413-aot-for-all-snode.md

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* improvements

* improvements

Co-authored-by: Ailing  <ailzhang@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* [ci] Add new buildbot with latest driver for Linux/Vulkan test (#4953)

* [ci] Add new buildbot with latest driver for Linux test

* Removed unused Jenkinsfile and travis

* Ref to issue

* Change matrix

* Change matrix format

* Change indented maybe

* String maybe

* First remove runs-on

* Minor

* Use nested array

* Use nested array

* Use nested array

* Debug path

* Revert "Debug path"

This reverts commit 000db2ad746f1d670e7fa7c9bdd1fad0209b8147.

* Debug path

* Revert

* Remove trailing space

* [vulkan] Set kApiVersion to VK_API_VERSION_1_3 (#4970)

* Change vulkan version to fix AMD crash problem.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [bug] [simt] Fix the problem that some intrinsics are never called (#4957)

* [bug] [simt] Fix the problem that some intrinsics are never called

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [llvm] Create ModuleToFunctionConverter (#4962)

* [llvm] Create ModuleToFunctionConverter

* fix wild pointer

* get_compute_device

* [build] Fixed Ilegal Instruction Error when importing PaddlePaddle module (#4969)

* Trigger CI failure

* [build] Fixed Ilegal Instruction Error when importing PaddlePaddle module

* CI run: second time

* CI run: third time

* Log hardware info for CI build-bot

* [test] Add an ndarray test in C++. (#4972)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [llvm] Make codegen produce static llvm::Module (#4975)

* [llvm] Make codegen produces static llvm::Module

* Update taichi/codegen/codegen_llvm.h

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* TI_WITH_LLVM

* fix

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [ci] [build] Containerize Windows CPU build and test (#4933)

* [ci] [build] Containerize Windows CPU build and test

* Disable ninja

* Avoid pybind11_add_module()

* Force reinstall

* Find pybind11

* Include pybind11 dir

* Update include dir

* Remove trailing whitespace

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use correct pybind11

* Add path

* Enable no extras for pybind11_add_module

* Add no_extra

* Clone in the container

* Use github job container

* Add runs-on

* Revert back to docker based jobs

* Install instead of develop

* [ci] [build] Containerize Windows CPU build and test

* Disable ninja

* Avoid pybind11_add_module()

* Force reinstall

* Find pybind11

* Include pybind11 dir

* Update include dir

* Remove trailing whitespace

* Use correct pybind11

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add path

* Enable no extras for pybind11_add_module

* Add no_extra

* Clone in the container

* Use github job container

* Add runs-on

* Revert back to docker based jobs

* Install instead of develop

* Use tar in jobs

* Update cmake

* Skip clone

* Manual fixing white space

* Remove comments

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [llvm] Make cache writer support BC format (#4978)

* [Build] Improve Windows build script (#4955)

* Improve Windows build script

* Switch to clean up intermediates

* [refactor] Improve serializer and cleanup utils (#4980)

* [refactor] Improve serializer and cleanup utils

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [llvm] Support both BC and LL cache format (#4979)

* [llvm] Support both BC and LL cache format

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rm

* fix fs

* fix

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [misc] Add ASTSerializer::visit(ReferenceExpression *) (#4984)

* [bug] Fix infinite recursion of get_offline_cache_key_of_snode_impl() (#4983)

* Fix infinite recursion of get_offline_cache_key_of_snode_impl

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix some comments

* Fix

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [cuda] Add block and grid level intrinsic for cuda backend (#4977)

* Add block/grid level intrinsics

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix syntax

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Workflow] Update release_test.sh (#4960)

Co-authored-by: Chengchen(Rex) Wang <14366016+rexwangcc@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Provision of prebuilt LLVM 10 for VS2022 (#4987)

* [llvm] Use serializer for LLVM cache (#4982)

* [llvm] Use serializer for LLVM cache

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ctor

* fix

* fix to pointer

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix order

* wip

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* fix

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Doc] Fix docs deploy netlify test configuration (#4991)

* fix docs deploy netlify test configuration

* check netlify change to run docs preview

* [Doc] Updated URL (#4990)

* Updated URL

* Updated URL

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Doc] Update trouble shooting URL in bug report template (#4988)

* [Lang] [type] Refactor quantized_types module and make quant APIs public (#4985)

* [Type] Refactor quantized_types module and make quant APIs public

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix pylint

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update README.md

Tests if netlify is still working

* [test] Fix a few mis-configured ndarray tests (#5000)

* [Doc] Fix netlify cache & sync doc without pr content (#5003)

* [refactor] Program owns allocated ndarrays.

The end goal of this refactor is let Ndarray be a simple wrapper around
(DeviceAllocation, dtype, shape) without having to worry about memory
allocation/deallocation. But its current implementation heavily couples
with Program*, so an intermediate state would be:
- If created from Program, Ndarray handles deviceallocation in
ctor/dtor.
- We'll add another ctor simply constructing Ndarray from
(DeviceAllocation, dtype, shape) and update the codebase to it.

ghstack-source-id: bdfd24154428a3fd92ca05688333509d0402e53a
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/4996

* [test] Add test for Ndarray from DeviceAllocation

ghstack-source-id: 7d7c5486bce0a491170b52ba3ae809b4853d5447
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/4997

* [refactor] Construct ndarray from existing DeviceAllocation.

The end goal is make this the only ctor for Ndarray class.

ghstack-source-id: a7294096285b3879a84ffbb41196a136b2b605a8
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/4998

* [refactor] Free ndarray's memory when python GC triggers

Previously Program manages lifetime of all allocated ndarrays. So when
you call `del ndarray` in python, its memory was not freed. This PR
changes the behavior that `ndarray` memory gets deallocated when python
GC triggers, or its containing `Program` gets destructed, whichever
happens first. There're some quirks around how we handle the async
python GC and manual `ti.reset()`. Thanks to k-ye, we now added a
`generation` number to track the containing program instance of ndarrays
so that memory deallocation happens correctly.

ghstack-source-id: 4fdef9c8285e2188c4afaffe6febdd45e5164b15
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/4999

* [refactor] Move ndarray fast fill methods to Program

This PR gets rid of `LlvmProgramImpl*` member inside `Ndarray` class,
which is a step closer towards decoupling `Ndarray` and memory
management.

ghstack-source-id: 181d28eba3ded5c95d8c70c95a293505bbfebf01
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5002

* [refactor] Get rid of data_ptr_ in Ndarray

ghstack-source-id: d795592d21f4a3da4a6c8ccffdce1dbc40ad99aa
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5004

* [Doc] Branding updates. Also tests netlify. (#4994)

* Branding updates. Also tests netlify.

* Minor editorial updates to trigger netlify preview.

* Minor updates to re-trigger CI/CD

* [AOT] Supported inclusion of taichi as subdirectory for AOT modules (#5007)

* Support building taichi as CMake subdirectory

* Fixes for export-less integration on Android

* [misc] Version bump: v1.0.2 -> v1.0.3 (#5008)

* [Lang] [type] Fix parameter name 'range' for ti.types.quant.fixed (#5006)

* [SIMT] Add match_any warp intrinsics (#4921)

* add match_any warp intrinsic

del f32

reset

* alter predicate to value

* update warp.py to sync with PR4957

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [doc] Update community section (#4943)

* [doc] Update community section

add active events and communication

* fix typo

* refine docs

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>

* Update README.md

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>

Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [llvm] Add serializable LlvmLaunchArgInfo (#4992)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [bug] Fixed numerical error for Atomic-Sub between unsigned values with different number of bits (#5011)

* [refactor] Move get ndarray data ptr to program (#5012)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [ci] [build] Enable ccache for windows docker (#5001)

* Enable ccache for windows docker

* only run windows docker job

* copy ccache_folder

* trigger CI

* Re-enable all jobs

* remove dumb text

* [test] Unify kernel setup for ndarray related tests

We'll reuse these two kernels for cgraph tests as well so let's clean it
up first.

ghstack-source-id: acd772f092ac9044197ac2e1f16100ba4ba9005d
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5014

* [aot] Build and run graph without serialization

This PR servces as the base PR with a minimal example of building and
running a Graph. Runtime values for graph arguments can be either
scalars or ndarrays.

For detailed proposal please see #4786.

Things handled in this PR:
- Maximize common code/runtime shared by the two workflows below:
  1. build -> compile -> run
  2. build -> compile -> serialize -> deserilize -> run
- Graph arguments are annotated with dtype and element shape for ndarray (temporary
until we have vec3 types in C++)

Things that we've discussed but not included in this PR:
- C API: I'll leave that for a unified C API PR in the future.
- bind IValues to graph: easy, will add later.

ghstack-source-id: f459afccdde56b59ab0ecc860ed11d761a20fe0a
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5015

* [Llvm] Add AOT builder and loader (#5013)

* [Llvm] Add AOT builder and loader

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check nullptr

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [ci] Fix nightly macos (#5018)

* [bug] Revert freeing ndarray memory when python GC triggers (#5019)

* [SIMT] Add match_all warp intrinsics (#4961)

* add match_all warp intrinsic by ptx

* add args to match_all in warp.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update warp.py to sync with PR4957

* update llvm_context.cpp: add more details about match_all_sync intrinsic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_simt.py

Initialize a with1

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [AOT] Support importing external Vulkan buffers (#5020)

* [Bug] [type] Fix frontend type check for reading a whole bit_struct (#5027)

* fix fast_gui rgba bug (#5031)

* [doc] Update OS names (#5030)

* [ci] Disable win cpu docker job test (#5033)

* Disable win cpu docker job test

* Revert changes on naming

* [aot] Serialize built graph, deserialize and run.

related: #4786

[Update]: based on an offline discussion with k-ye, I've split the
original `Graph` class into `GraphBuilder` and `CompiledGraph` classes
in C++. Note that the implementation didn't follow exactly the builder
design pattern as our builder is slightly simpler as shown below.
The complexity in our problem is more in the need of serialization and
deserialization for the same graph representation intead of its
construction process. So IMHO it's good enough to separate the
GraphBuilder and Runner(`CompiledGraph`) as we discussed. Please feel
free to correct me if I'm wrong!

```
GraphBuilder
    |
 compile()
    |
    |
CompiledGraph <----  serialize/deserialize ----> file
    |
    |
   run()
```

This PR demonstrates a minimal example of serializing a built graph,
deserializing and running it.

ghstack-source-id: 7dda7cc11ef3a946f31d75783a8cfd1836e47ba5
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5024

* [aot] Move ArgKind as first argument in Arg class

Thought this might be more intuitive for users.

ghstack-source-id: 865062f0982db4d69a41ba345a1d254c2054a12f
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5025

* [aot] Bind graph APIs to python and add mpm88 example (#5034)

This PR supports graph builder and runner APIs in python. Note for
simplicity I've merged builder and runner in the same Python class.
Please feel free to comment if you have any suggestions.

This PR also adds a test of saving mpm88 graph in aot module, as well
as an example script to demonstrate the speed improvement (15fps ->
45fps) compared to the current taichi.

ghstack-source-id: 600e604b141f9e534045f930d8424125c38ed875
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5026

* [Lang] [type] Refactor quant type definition APIs (#5036)

* [Lang] [type] Refactor quant type definition APIs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Metal] Support Ndarray (#4720)

* [Metal] Support Ndarray

* simple work

* fix copying

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip

* fixes

* fix devalloc id bug, enable tests

* fix extra_arg offset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove size

* rm size

* ref

* fix for ret matrix type

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix test

* fix zero

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Lang] Fix potential precision bug when using math vector and matrix types (#5032)

* fix fast_gui rgba bug

* fix floating precision problem for vector types

* add more flexible initialization methods for matrix type

* add more flexible initialization methods for matrix type

* add glsl determinant and inverse function support

* add glsl determinant and inverse function support

* add glsl determinant and inverse function support

* add glsl determinant and inverse function support

* add glsl determinant and inverse function support

* fix matrix precision type bug and use matrix-member inverse

* fix matrix precision type bug and use matrix-member inverse

* [Vulkan] Fixed vulkan backend crash on AOT examples (#5047)

* Exit CI builds when download of prebuilt packages fails (#5043)

* [ci] Run cpp tests via run_tests.py (#5035)

* [ci] Run cpp tests via run_tests.py

* default to False

* enable cpp on win

* Set host_write to false for opengl ndarray (#5038)

As discussed ndarrays can be written through calling write kernels, but
it shouldn't support directly map on host and write to it.

* [Lang] Build sparse matrix from ndarray (#4841)

* build sparse matrix from ndarray

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add shape property

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix errors

* fix pylint

* fix failed test

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve

* improve

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix ndarray data ptr not found

* pylint

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check ndarray dimension when build sparse matrix

* improve

* add example docstring

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [bug] Added type promotion support for atan2 (#5037)

* [bug] Added type promotion rule for atan2

* Fixed minor issue

* Modified type promotion rule

* [Doc] Updated type system (#5054)

* Editorial updates

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [bug] Ndarray type should include primitive dtype as well (#5052)

This PR does three things:
- Switch cgraph `Arg` to take in `ti.f32/i32` instead of string
`f32/i32` as inputs
- Fix a bug that when we produce injected ndarray args for compilation
we only produced f32 ndarrays, which won't work for ndarray of other
primitive dtypes.
- No need to specify `element_shape` if it's scalar arg or scalar
ndarray arg.

* [Lang] [ir] Add short-circuit if-then-else operator (#5022)

* [Lang] [ir] Add a short-circuit if-then-else operator and use it to implement IfExp

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Lang] Struct Classes implementation (#4989)

* Initial Struct Classes implementation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update method translation to mark as taichi funcs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert unwanted changes to impl.py

* Update test_api.py

* Update struct.py

Update with review comments.

* Update class decorator docstring

* Update func marking and add tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update tests/python/test_custom_struct.py

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update docstrings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update python/taichi/lang/struct.py

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update python/taichi/lang/struct.py

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update python/taichi/lang/struct.py

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* [Lang] [type] Disallow reading a whole bit_struct (#5061)

* [bug] Remove operator ! for Expr (#5062)

* [build] [bug] Ensure the assets folder is copied to the project directory (#5063)

* Bugfix: ensure the assets folder are copied

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* call copy_assets before setup()

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [refactor] Split GraphBuilder out of Graph class (#5064)

* [aot] [CUDA-AOT PR #0] Refactored compile_module_to_executable() to CUDAModuleToFunctionConverter (#5070)

* [refactor] Specialized Ndarray Type is (element_type, shape, layout)

ghstack-source-id: 977cd453359b8ccc09deccacc62a915abcd42734
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5065

* [refactor] Pass element_shape and layout to C++ Ndarray

Note we still flatten element_shape in the C++ Ndarray, which is blocked by the accessors and will be fixed
in the following PRs.

ghstack-source-id: 0cb5c05f0ad4c188546d7174a1d82f398bc717c2
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5066

* [Lang] Support constructing vector and matrix ndarray from ti.ndarray()

ghstack-source-id: 3055ba79c35aecea61c449038b4f8c07e87571b9
Pull Request resolved: https://github.com/taichi-dev/taichi/pull/5073

* [refactor] Resolve comments from #5065 (#5074)

* [Example] Update mass_spring_3d_ggui.py to v2 (#3879)

* cleaner mass_spring_3d_ggui.py

* fix collision

* Fix penetration

* No capitalized globals

* fix compute_force

* parameter tweaks

* Looks good now

* Update TaichiCore.cmake

* Update mass_spring_3d_ggui.py

* Update mass_spring_3d_ggui.py

Change variable name `allow_bending` to `bending_springs`.

* [doc] Fix broken link for github action status badge (#5076)

* [doc] Fix link for github action status badge

* Update README.md

Co-authored-by: Bo Qiao <boqiao@taichi.graphics>

* Update README.md

Co-authored-by: Bo Qiao <boqiao@taichi.graphics>

Co-authored-by: Bo Qiao <boqiao@taichi.graphics>

* [llvm] Specialize element shape for LLVM backend (#5071)

* Specialize element shape for LLVM backend

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [spirv] Specialize element shape for spirv codegen. (#5068)

* Specialize element shape for spirv codegen.

* Fix index for size_var_names

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Slight changes for better code style.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [Lang] Add more initialization routines for glsl matrix types (#5069)

* add more initialization routines for glsl matrix types

* add more initialization routines for glsl matrix types

* [cuda] [simt] Add assertions for warp intrinsics on old GPUs (#5077)

* Add guard for cc smaller than 70

* Fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix pylint

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [refactor] Correctly set ndarray element_size and nelement (#5080)

* [infra] Refactor Vulkan runtime into true Common Runtime (#5058)

* Remove all references to Vulkan in common runtime & fix device API for OpenGL (bindings) and DirectX 11 (memory leaks)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix cpp test

* update

* update

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [llvm] [aot] CUDA-AOT PR #1: Extracted common logics from CPUAotModuleImpl into LLVMAotModule (#5072)

* [llvm] [aot] CUDA-AOT PR #1: Extracted common logics from CPUAotModuleImpl into LLVMAotModule

* Renamed LLVMAotModule

* Fixed minor issue

* [llvm] [refactor] Merge AtomicOpStmt codegen in CPU and CUDA backends (#5086)

* [llvm] [refactor] Merge AtomicOpStmt codegen in CodeGenLLVMCUDA and CodeGenLLVM

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [refactor] Make sure Ndarray shape is field shape (#5085)

* [autodiff] Allocate dual and adjoint snode (#5083)

* allocate dual and decouple grad and adjoint

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* update

* update the adjoint name

* fix matrix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* recover the grad name

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [build] [refactor] Change CMake global include_directories to target based function (#5082)

* Change to target_include_directories

* Update runtime cmake

* Pre-commit format

* [Doc] Add documentation of Taichi Struct Classes. (#5075)

* Add documentation of Taichi Struct Classes.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* edits

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update docs/lang/articles/advanced/odop.md

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update docs/lang/articles/advanced/odop.md

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update docs/lang/articles/advanced/odop.md

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update docs/lang/articles/advanced/odop.md

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update docs/lang/articles/advanced/odop.md

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Update docs/lang/articles/advanced/odop.md

Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* Fix capitalization of Python

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>

* [llvm] [aot] Add LLVM-CPU AOT tests (#5079)

* [llvm] [aot] Add LLVM-CPU AOT tests

* Refactored AOT test framework

* Fixed minor issue

* Enabled LLVM CPU-AOT for arm64 architecture

* Added aot unit tests programming guide

* Fixed typo

* Refactored AOT test framework

* [autodiff] Extract shared components for reverse and forward mode (#5088)

extract shared components for reverse and forward mode

* [llvm] [refactor] Use LLVM native atomic ops if possible (#5091)

* [llvm] [refactor] Use LLVM native atomic ops if possible

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [bug] Minor fix for ndarray element_shape in graph mode (#5093)

Now that ndarray's element_shape is separated from shape, this hack can
be removed.

* Use pre-calculated runtime size array for gfx runtime. (#5094)

* [Doc] Improve ODOP doc structure (#5089)

* [llvm] [aot] CUDA-AOT PR #2: Implemented AOTModuleLoader & AOTModuleBuilder for LLVM-CUDA backend (#5087)

* [llvm] [aot] Add LLVM-CPU AOT tests

* Refactored AOT test framework

* Fixed minor issue

* Enabled LLVM CPU-AOT for arm64 architecture

* Added aot unit tests programming guide

* [llvm] [aot] CUDA-AOT PR #2: Implemented AOT Module Loader for LLVM-CUDA backend

* Fixed typo

* Fixed minor issue

* Refactored AOT test framework

* [llvm] [aot] Add LLVM-CUDA AOT tests

* Added cuda device availability check

* clean hidden override functions (#5097)

* [refactor] Update Ndarray constructor used in AOT runtime. (#5095)

This constructor is mainly used to construct an Ndarray out of an
existing device allocation. This PR updates the behavior of this
constructor to seprate element_shape out of shape.

* [refactor] Remove ndarray element shape from extra arg buffer (#5100)

* Remove element shape from extra args.

* [llvm] [refactor] Move load_bit_pointer() to CodeGenLLVM (#5099)

* [llvm] [refactor] Move load_bit_pointer() to CodeGenLLVM

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [test] Save mpm88 graph in python and load in C++ test. (#5104)

This is a simplified version of
https://github.com/ailzhang/taichi-aot-demo/tree/mpm88_cgraph_demo which
strips the GGUI rendering part. Let's add this as a test (as well as
demo ;) ) in the codebase. We used to test the saving part of mpm88
btw and it was replaced with this e2e test.

Huge thanks to @k-ye for help debugging the GGUI rendering issue!

* [Example] Update visual effects of mass_spring_3d_ggui.py (#5081)

* update scene for mass_spring simulation

* update scene for mass_spring simulation

* update scene for mass_spring simulation

* [type] [refactor] Remove redundant promotion for custom int in type_check (#5102)

* [llvm] [refactor] Replace cast_int() with LLVM native integer cast (#5110)

* [llvm] [refactor] Use LLVM native integer cast

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [type] [llvm] [refactor] Fix function names in codegen_llvm_quant (#5115)

* [type] [llvm] [refactor] Fix function names in codegen_llvm_quant

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [bug] Fix build without llvm backend crash (#5113)

* [bug] Fix build without llvm backend crash

* Update taichi/python/export_lang.cpp

Co-authored-by: yekuang <k-ye@users.noreply.github.com>

Co-authored-by: yekuang <k-ye@users.noreply.github.com>

* [build] [refactor] Move Vulkan runtime out of backends dir (#5106)

* Precommit fix

* Add spirv source

* Move device code back to backends

* Expose glfw include in vulkan rhi

* Fix llvm include

* Fix include for test

* [autodiff] Add forward mode pipeline for autodiff pass (#5098)

* Add forward mode pipeline for autodiff pass
* Replace the grad parameter with AutodiffMode to distinguish three kinds of kernels primal, forward ad and reverse ad

* [aot] [llvm] LLVM AOT Field #0: Implemented FieldCacheData & refactored initialize_llvm_runtime_snodes() (#5108)

* [aot] [llvm] Implemented FieldCacheData and refactored initialize_llvm_runtime_snodes()

* Addressed compilation erros

* Added initialization for struct members

* Minor fix

* [aot][bug] Use cached compiled kernel pointer when it's added to graph (#5122)

multiple times

This bug was triggered when we tried to port stable_fluid demo so this
PR also added a cgraph based stable fluid demo.

```
ti example stable_fluid_graph
```

Note it's not ideal to save both `FunctionType compiled_` as well as
`aot::Kernel compiled_aot_kernel_` inside C++ `Kernel` class. But we plan to clean that
up (likely by getting rid of `FunctionType compiled_`) in #5114.

* [aot] [llvm] LLVM AOT Field #1: Adjust serialization/deserialization logics for FieldCacheData (#5111)

* [aot] [llvm] Implemented FieldCacheData and refactored initialize_llvm_runtime_snodes()

* Addressed compilation erros

* [aot] [llvm] LLVM AOT Field #1: Adjust serialization/deserialization logics for FieldCacheData

* Editorial update (#5119)

* [lang] Texture support 0/n: IR changes (#5134)

* fix mass_spring_3d_ggui backend (#5127)

* [Example] Fix block_dim warning in ggui (#5128)

* fix block dim warning in ggui

* fix block dim warning in ggui

* fix block dim warning in ggui

* [ci] Enable yapf and isort on example files (#5140)

Note we explicitly exclude running pylint on them as it requires a bunch
of manual fixes first.

* [type] [refactor] Misc improvements to quant codegen (#5129)

* Replace is_custom_type() with is_quant()

* Rename two functions

* Use get_constant() if possible

* Rename two metal functions

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [aot] [llvm] LLVM AOT Field #2: Updated LLVM AOTModuleLoader & AOTModuleBuilder to support Fields (#5120)

* [aot] [llvm] Implemented FieldCacheData and refactored initialize_llvm_runtime_snodes()

* Addressed compilation erros

* [aot] [llvm] LLVM AOT Field #1: Adjust serialization/deserialization logics for FieldCacheData

* [llvm] [aot] Added Field support for LLVM AOT

* [aot] [llvm] LLVM AOT Field #2: Updated LLVM AOTModuleLoader & AOTModuleBuilder to support Fields

* Fixed merge issues

* Stopped abusing Program*

Co-authored-by: Frost Ming <mianghong@gmail.com>
Co-authored-by: Taichi Gardener <taichigardener@gmail.com>
Co-authored-by: Ailing <ailzhang@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Taichi Gardener <62079278+taichi-gardener@users.noreply.github.com>
Co-authored-by: Zhanlue Yang <zy2284@columbia.edu>
Co-authored-by: Bo Qiao <boqiao@taichi.graphics>
Co-authored-by: Haidong Lan <turbo0628g@gmail.com>
Co-authored-by: PGZXB <420254146@qq.com>
Co-authored-by: Bob Cao <bobcaocheng@gmail.com>
Co-authored-by: yekuang <k-ye@users.noreply.github.com>
Co-authored-by: Vissidarte-Herman <93570324+Vissidarte-Herman@users.noreply.github.com>
Co-authored-by: Zhanlue Yang <jim19930609@gmail.com>
Co-authored-by: Gabriel H <64807734+ghuau-innopeak@users.noreply.github.com>
Co-authored-by: 0xzhang <33616362+0xzhang@users.noreply.github.com>
Co-authored-by: yixu <BillXu2000@126.com>
Co-authored-by: Zeyu Li <47965866+GaleSeLee@users.noreply.github.com>
Co-authored-by: pengyu <6712304+FantasyVR@users.noreply.github.com>
Co-authored-by: Yi Xu <xy_xuyi@foxmail.com>
Co-authored-by: taichiCourse01 <tgc01@taichi.graphics>
Co-authored-by: Chang Yu <g1n0st@live.com>
Co-authored-by: PENGUINLIONG <admin@penguinliong.moe>
Co-authored-by: Lin Jiang <90667349+lin-hitonami@users.noreply.github.com>
Co-authored-by: Haidong Lan <haidonglan@taichi.graphics>
Co-authored-by: YuZhang <YuCrazing@users.noreply.github.com>
Co-authored-by: Chuandong Yan <90600320+chuandongyan@users.noreply.github.com>
Co-authored-by: Chengchen(Rex) Wang <14366016+rexwangcc@users.noreply.github.com>
Co-authored-by: Justin <62801799+Justinterest@users.noreply.github.com>
Co-authored-by: Ailing Zhang <ailing@taichi.graphics>
Co-authored-by: Zeyu Li <li_zeyu@pku.edu.cn>
Co-authored-by: yanqingzhang <yanqingdw@gmail.com>
Co-authored-by: daylily <xy.r@outlook.com>
Co-authored-by: bsavery <brian.savery@gmail.com>
Co-authored-by: Alex Brown <96645475+AlexBrown42@users.noreply.github.com>
Co-authored-by: Bo Qiao <qiao.bo@outlook.com>
Co-authored-by: Mingrui Zhang <33411325+erizmr@users.noreply.github.com>
Co-authored-by: Olinaaaloompa <106292061+Olinaaaloompa@users.noreply.github.com>
@ailzhang
Copy link
Contributor Author

FYI you can find a few examples of loading serialized mpm88/sph/stable fluid demos and launch in C++ at https://github.com/ailzhang/taichi-aot-demo. We're in the process of adjusting & finalizing the public facing API, will send more updates soon!

@k-ye k-ye removed this from the Taichi v1.1.0 milestone Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Suggest an idea on this project
Projects
Status: In Progress
Development

No branches or pull requests

2 participants