[RFC] Taichi's Ahead of Time (AOT) Module #3642

k-ye · 2021-11-29T08:55:49Z

We'd like to share our ideas on how to implement the AOT feature in Taichi. AOT refers to the process of using Taichi as a GPU compute shader/kernel compiler: The AOT users can compile their Taichi kernels into an AOT module, package this module alongside their apps (likely without a Python environment, e.g. an Android app), and load/execute the compiled shaders from this module in the app.

Note that AOT is already an ongoing work, hence some of the tasks have already been implemented. For a quick peak of the Taichi AOT workflow, please check out this test.

Goals

Provide a workflow to package the Taichi kernels into a module that can be loaded and executed from a non-Python environment (e.g. an Android app using Vulkan).
Priotitize on the mobile platforms deployment, including OpenGL ES, Vulkan and Apple Metal.
Priotitize on the dense Taichi fields. However, we have plan to further decouple the AOT from Taichi fields (see the Ndarray section).

API Proposal

Taichi provides a utility, taichi.aot.Module, for compiling the Taichi kernels and fields info into a module file. It provides these APIs:

add_kernel(kernel_fn): Add a Taichi kernel to the AOT module.
add_kernel_template(kernel_templ_fn): Add a Taichi kernel template to the AOT module.
(*) add_field(name, field) : Add a Taichi field to the AOT module. However, we hope that Ndarray can serve as a more convenient dense data container in the AOT use cases.
save(filepath, filename): Save this AOT module to filepath/filename.

We will walk through the Module usage with the following example.

x = ti.Vector.field(2, float, n_particles)
v = ti.Vector.field(2, float, n_particles)
...

@ti.kernel
def init():
  ...

@ti.kernel
def substep():
  ...

def run_jit():
    gui = ti.GUI('mpm88')
    # Driver logic
    init()
    while True:
        for _ in range(50):
            substep()
    gui.circles(x.to_numpy())
    gui.show()

def run_aot():
    # 1
    mod = ti.aot.Module(arch=ti.opengl)
    # 2
    mod.add_kernel(init)
    mod.add_kernel(substep)
    # 3
    mod.add_field(x, name="x")
    mod.add_field(v, name="v")
    # 4
    mod.save('/path/to/dir', 'opengl.tcb')  # .tcb for "taichi binary"

We have created an AOT module, mod, targeted for the GL/ES shading language.
The snippet has defined two Taichi kernels: init and substep. This step adds both kernels to mod.
The snippet has also defined two Taichi fields: x and v. Both are added to mod, too.
Finally, we save the module to /path/to/dir/opengl.tcb.

This completes the works required at the Tachi/Python side.

Assuming that we then want to deploy this to an Android app, and have added opengl.tcb to the app project, we imagine the following set of C++ APIs useful. Note that the language implementing the API is mostly irrelevant, and should be chosen according to the targeted platform suitability (e.g. ObjC/Swift for iOS, Java/Kotlin for Android). We choose C++ here just for the developer's familarity (Although at a very low level, C++ is suitable for both mobile platforms).

C++ API

namespace taichi {

// Corresponds to a ti.field
class Field {
public:
  /**
   * Copies the content of this field to @param dest_buffer.
   * 
   * Internally uses glCopyBufferSubData() along with
   * GL_COPY_READ_BUFFER/GL_COPY_WRITE_BUFFER.
   */
  void CopyTo(GLint dest_buffer, GLintptr write_offset);

  /**
   * Copies the content of this field to @param host_buffer.
   */
  void CopyTo(char* host_buffer);
  
  /**
   * Number of bytes this field occupies
   */
  std::size_t size() const;
};

class KernelArgBuilder {
 public:
  void SetInt(int val);
  void SetFloat(float val);
  void SetNdarray(GLuint ssbo, const std::vector<int>& shape);
};

class ArgsContext {
 public:
  KernelArgBuilder GetArgBuilder(int index);
};

// Corresponds to a @ti.kernel with NO ti.template parameter
class Kernel {
public:
  // ArgsContext is for passing the kernel arguments
  void Run(const ArgsContext& c);
  // Small helper, in case there is no argument
  void Run();
};

// Examples:
// {"bool_key", true}
// {"int_key", 42}
// {"field_key", a taichi::Field object}
class TemplateArg;

// Corresponds to a @ti.kernel with AT LEAST ONE
// ti.template parameter
class KernelTemplate {
 public:
  void Run(const std::vector<TemplateArg>& tmpl_args, const ArgsContext& c);
  // Small helper, in case there is no argument
  void Run(const std::vector<TemplateArg>& tmpl_args);
};

// Corresponds to an AOT module that is compiled for
// the OpenGL backend.
class GLProgram {
public:
  Field GetField(const std::string& name);

  Kernel GetKernel(const std::string& name);
  KernelTemplate GetKernelTemplate(const std::string& name);
};

}  // namespace taichi

We can then use the above API in the following manner:

GLuint x_ssbo;
glGenBuffers(1, &x_ssbo);

/************* App initialization *************/

taichi::GLProgram program{"/path/to/dir/opengl.tcb"};
auto init_kernel = program.GetKernel("init");
init_kernel.Run();

/************* App rendering loop *************/

auto substep_kernel = program.GetKernel("substep");
for (int i = 0; i < 50; i++) {
  substep_kernel.Run();
}

auto x_field = program.GetField("x");
x_field.CopyTo(x_ssbo);
// The MPM88 particle position data are now copied into `x_ssbo`, and can be
// used in a particle-system rendering pipeline.

Taichi kernel template

So far we have only talked about the regular Taichi kernels. However, there is a special kind of kernel: A Taichi kernel with at least one ti.template parameter. E.g.

x = ti.field(ti.f32, shape=8)
y = ti.field(ti.f32, shape=(8, 4))

@ti.kernel
def add_one(f: ti.template()):
    for I in ti.grouped(f):
        x[I] += 1

# This will instantiate two Taichi kernels, bounded to different Taichi fields, `x` and `y`.
add_one(x)
add_one(y)

The special part about this is that Taichi will instantiate a separate kernel body for different input arguments. Readers coming from the C++ background can relate this to the C++ function template: It is not until you invoke a function template with the actual type arguments filled, will the compiler instantiate a function definition for you. As a result, one cannot identify a compiled Taichi kernel just by its name. Instead, it is the combination of a string (the kernel template name) and the template args.

Module.add_kernel_template() is for handling this situation.

with m.add_kernel_template(add_one) as kt:
    kt.instantiate(f=x)
    kt.instantiate(f=y)

Then on the app side, we can retrieve and run these instantiated kernels with the code below.

auto x_field = program.GetField("x");
auto y_field = program.GetField("y");
auto add_one_tmpl = program.GetKernelTemplate("add_one");
bar_tmpl.Run(/*template_args=*/{taichi::TemplateArg{"f", x_field}});
bar_tmpl.Run(/*template_args=*/{taichi::TemplateArg{"f", y_field}});

Ndarray: making data containers more flexible

Currently, Taichi field is the official way for passing data between the kernel side and the host side. However, it comes with a few restrictions:

All the Taichi fields are currently packed into a single GPU buffer (the root buffer). This means that to retrieve the data of a single field, we need a look-up table to figure out its offset and range in the buffer. (Note that different GPU APIs have different terminology for this buffer, e.g. SSBO for OpenGL, MTLBuffer for Apple Metal, etc.)
Because the root buffer size is determined at compile time, we cannot use a field of different shape at runtime. The implication is that each time we want to change the field size, we have to re-run the AOT pipeline.
Users cannot plug in their existing GPU buffers into the Taichi kernel. Say if I already have an SSBO x_ssbo holding the particles' position in my particle system, we have to run the Taichi kernels, then copy the data from the root buffer to x_ssbo. Ideally, we can achieve zero-copy here by just binding x_ssbo to the GL shaders generated by Taichi.

To overcome these disadvantages, we have been prototyping a new data container called Ndarray. Ndarray can be viewed as a more flexible and systematic implementation of Taichi's external array.

Say if we'd like to to pass a 2-D array of vec2 into a Taichi kernel, here's how we can re-write the kernels using the Ndarray container:

x = ti.Vector.ndarray(n=2, dtype=ti.f32, shape=(128, 128))

@ti.kernel
def substep(x_arr: ti.types.Vector.ndarray(n=2, dtype=ti.f32, dim=2)):
    ...

def run_jit():
    ...
    for _ in range(50):
        substep(x)
    ...

def run_aot():
    mod = ti.aot.Module(arch=ti.opengl)
    mod.add_kernel(init)
    mod.add_kernel(substep)
    mod.save('/path/to/dir', 'opengl.tcb')  # .tcb for "taichi binary"

If our app already has an SSBO x_ssbo of the matching traits, we can pass it to the compiled kernel in this way:

GLuint x_ssbo;
glGenBuffers(1, &x_ssbo);

/************* App initialization *************/

taichi::GLProgram program{"/path/to/dir/opengl.tcb"};
auto init_kernel = program.GetKernel("init");
init_kernel.Run();

/************* App rendering loop *************/

auto substep_kernel = program.GetKernel("substep");
taichi::ArgsContext ctx;
// |x_ssbo| and its shape are directly passed in to the
// compiled Taichi kernels as runtime arguments.
ctx.GetArgBuidler(/*index=*/0)
  .SetNdarray(/*ssbo=*/x_ssbo, /*shape=*/{128, 128});
for (int i = 0; i < 50; i++) {
  substep_kernel.Run();
}

// Note that there is no need to copy the data out from a Taichi
// field. |x_ssbo| is now populated with the correct data

Implementaion Roadmap

Q & A

What Taichi features do you plan to support?
1. Dense Taichi fields + Ndarray
2. Non-LLVM backends, including OpenGL, Vulkan and Apple Metal
3. Taichi template kernels
What Taichi features are currently out of the scope?
- sparse fields
- @ti.data_oriented
Other limitations?

The logic to invoke these kernels will still need to be re-written in the users' app (e.g. the run_jit() body in the above example). We may consider adding a compute graph in the future. Welcome discussion & proposal!
How to locate a Taichi kernel?

For a regular kernel, the kernel name (a string) is enough as the identifier. For a kernel template, it is a combination of the kernel name and the instantiating template args.
How to support upgrading?

We can include a version into the AOT module.

The text was updated successfully, but these errors were encountered:

ghuau-innopeak · 2021-11-29T21:26:39Z

Nice summary! I really like the idea of the API to load the kernels from C++!

I have a question related to the arguments of a TI Kernel. In the example of subset, we have some variables named "gravity", "bound" or "E" that are used within this function. During AOT execution, how do we expect to set them? Do we need to move them as Field/NdArray if those values can be changed at execution?

For the access to the Fields, would it help to make the root buffer readable from host? I know on mobile target, everything is in an Unified Memory, so performance are generally not impacted, but I'm not sure about Desktop target.
That could help to avoid generating internal kernels to copy data from the root buffer to custom host memory provided by the user as we currently do for Python. We could have a CopyTo a CPU buffer then:

GLuint x_ssbo;
glGenBuffers(1, &x_ssbo);

char buffer[N][M];

auto x_field = program.GetField("x");
x_field.CopyTo(x_ssbo);
// Or
x_field.CopyTo(buffer);

This can be useful I guess when a computation is done on Vulkan/OpenGL, but results might be used on a different processor/backend (CPU?). We could still glMapBuffer the SSBO I guess but that would just simplify the code to the users of the library :).

One thing I have in mind too is a configuration of the 'target' backend for the AOT modules. For example, we can generate the OpenGL/Vulkan module on our desktop machine which might have a high end GPU and recent driver with plenty of extension, but the execution of those AOT modules might be on a more limited machine that doesn't have those extensions. I don't know if we could provide a configuration of the extensions through Python, so when we initialize the Program Runtime, instead of checking the system, we could filter the one provided in this list too?

def run_aot():
    mod = ti.aot.Module(arch=ti.opengl, target_ext_config=config.txt)
    mod.add_kernel(init)
    mod.add_kernel(substep)
    mod.save('/path/to/dir', 'opengl.tcb')  # .tcb for "taichi binary"

with config.txt:

GL_EXT_debug_marker GL_ARM_rgba8 GL_ARM_mali_shader_binary GL_OES_depth24 GL_OES_depth_texture GL_OES_depth_texture_cube_map GL_OES_packed_depth_stencil GL_OES_rgb8_rgba8 GL_EXT_read_format_bgra GL_OES_compressed_paletted_texture GL_OES_compressed_ETC1_RGB8_texture GL_OES_standard_derivatives GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_EGL_image_external_essl3 GL_OES_EGL_sync GL_OES_texture_npot GL_OES_vertex_half_float GL_OES_required_internalformat GL_OES_vertex_array_object GL_OES_mapbuffer GL_EXT_texture_format_BGRA8888 GL_EXT_texture_rg GL_EXT_texture_type_2_10_10_10_REV GL_OES_fbo_render_mipmap GL_OES_element_index_uint GL_EXT_shadow_samplers GL_OES_texture_compression_astc ...

yuanming-hu · 2021-11-30T03:14:40Z

Random thought: in the future, it may make sense to wrap kernels and fields into a class for more modularity. This may end up with a syntax similar to ...

@ti.aot.module
class ParticleSystem:
    def __init__(self, n_particles):
        self.n_particles = n_particles

        # Persistent states
        with self.export_fields:
            self.x = ti.Vector.field(2, float, n_particles)
            self.v = ti.Vector.field(2, float, n_particles)

    @ti.kernel(export=True)
    def substep(dt: ti.f32):
        for i in self.x:
            self.x[i] += dt * self.v[i]

    @ti.kernel(export=True)
    def fetch_x(x_output: ti.types.Vector.ndarray(n=2, dtype=ti.f32, dim=2)):
        for i in self.x:
            x_output[i] = self.x[i]

    @ti.kernel(export=True)
    def compute_total_distance(dist_sum: ti.field.ndarray(dtype=ti.f32, dim=2)):
        # Temporary field, destoried when kernel exits
        distance = ti.field(...)
        for i in self.x:
            for j in range(self.n_particles):
                distance[i, j] = (self.x[i] - self.x[j]).norm()
            
        for i in self.x:
            x_output[i] = 0.0
            for j in range(n):
                x_output[i] += distance[i, j]

k-ye · 2021-11-30T03:35:22Z

Thanks for the reply!

I have a question related to the arguments of a TI Kernel. In the example of subset, we have some variables named "gravity", "bound" or "E" that are used within this function. During AOT execution, how do we expect to set them? Do we need to move them as Field/NdArray if those values can be changed at execution?

Variables like gravity and E are compile-time constants. So they are just literal values in the generated shaders.

For the access to the Fields, would it help to make the root buffer readable from host? I know on mobile target, everything is in an Unified Memory, so performance are generally not impacted, but I'm not sure about Desktop target. That could help to avoid generating internal kernels to copy data from the root buffer to custom host memory provided by the user as we currently do for Python. We could have a CopyTo a CPU buffer then:
GLuint x_ssbo;
glGenBuffers(1, &x_ssbo);

char buffer[N][M];

auto x_field = program.GetField("x");
x_field.CopyTo(x_ssbo);
// Or
x_field.CopyTo(buffer);
This can be useful I guess when a computation is done on Vulkan/OpenGL, but results might be used on a different processor/backend (CPU?). We could still glMapBuffer the SSBO I guess but that would just simplify the code to the users of the library :).

Yep. From a technical perspective, this should be quite straightforward :-) Updated the C++ API part.

One thing I have in mind too is a configuration of the 'target' backend for the AOT modules. For example, we can generate the OpenGL/Vulkan module on our desktop machine which might have a high end GPU and recent driver with plenty of extension, but the execution of those AOT modules might be on a more limited machine that doesn't have those extensions. I don't know if we could provide a configuration of the extensions through Python, so when we initialize the Program Runtime, instead of checking the system, we could filter the one provided in this list too?
def run_aot():
    mod = ti.aot.Module(arch=ti.opengl, target_ext_config=config.txt)
    mod.add_kernel(init)
    mod.add_kernel(substep)
    mod.save('/path/to/dir', 'opengl.tcb')  # .tcb for "taichi binary"
with config.txt:
GL_EXT_debug_marker GL_ARM_rgba8 GL_ARM_mali_shader_binary GL_OES_depth24 GL_OES_depth_texture GL_OES_depth_texture_cube_map GL_OES_packed_depth_stencil GL_OES_rgb8_rgba8 GL_EXT_read_format_bgra GL_OES_compressed_paletted_texture GL_OES_compressed_ETC1_RGB8_texture GL_OES_standard_derivatives GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_EGL_image_external_essl3 GL_OES_EGL_sync GL_OES_texture_npot GL_OES_vertex_half_float GL_OES_required_internalformat GL_OES_vertex_array_object GL_OES_mapbuffer GL_EXT_texture_format_BGRA8888 GL_EXT_texture_rg GL_EXT_texture_type_2_10_10_10_REV GL_OES_fbo_render_mipmap GL_OES_element_index_uint GL_EXT_shadow_samplers GL_OES_texture_compression_astc ...

Yeah the extension has introduced some trouble to us previously. Passing in a configuration list of the extensions the targeted platform can support sounds like a nice solution #:+1:

k-ye · 2021-12-25T03:11:19Z

Just realized that we need a way to be able to support more than one dumped modules. For example, if I have two Taichi scripts 'a.py' and 'b.py', each of them saving an AOT module, we need a way to support using the saved kernels from both scripts at the same time.

k-ye · 2022-02-23T05:41:59Z

Additional features we should consider supporting:

Pluggable unified device API. For example, users can pass in the Vulkan resources they have already created in their GPU pipeline.
Compute graph. Right now Taichi only supports kernel-level serialization. We should extend that to the whole Python program.
Support for @ti.data_oriented?

k-ye · 2022-02-25T03:52:22Z

I think we can have a Module interface, much like what TVM is doing: https://github.com/apache/tvm/blob/main/include/tvm/runtime/module.h

k-ye · 2022-03-01T03:56:04Z

More update on the technical direction.

What We Have So Far

AOT data structure + loader/builder https://github.com/taichi-dev/taichi/tree/master/taichi/aot
Vulkan runtime: https://github.com/taichi-dev/taichi/tree/master/taichi/backends/vulkan
Gabriel's C++ AOT demo: https://github.com/ghuau-innopeak/taichi-aot-tests/blob/main/desktop/main.cpp

Runtime API

Example usage

I hope we can simplify the above usage && make it generic for other backend. Below is what I imagine to be a slightly better API:

constexpr auto kArch = taichi::lang::Arch::vulkan;
// ... same as above
// This gives us the flexibility to plug in user's own VkDevice, if
// they already have one in their pipeline.
auto embedded_device =
    std::make_unique<taichi::vulkan::VulkanDeviceCreator>(evd_params);

taichi::vulkan::VkRuntime::Params params;
params.host_result_buffer = result_buffer;
params.device = embedded_device->device();
auto vulkan_runtime =
    std::make_unique<taichi::vulkan::VkRuntime>(std::move(params));

std::any mod_params = vulkan_runtime.get();
std::unique_ptr<taichi::aot::Module> vk_module =
    taichi::aot::Module::load("/path/to/aot_module", kArch, mod_params);
if (!vk_module) {
  printf("Cannot load Vulkan AOT module\n");
  return -1;
}
// Retrieve kernels/fields/etc from AOT module so we can initialize our
// runtime
auto root_size = vk_module->get_root_size();
printf("root buffer size=%d\n", root_size);
vulkan_runtime->add_root_buffer(root_size);

auto substep_kernel = vk_module->get_kernel("substep");
if (!substep_kernel) {
  printf("Cannot find 'substep' kernel\n");
  return -1;
}

// Run `substep_kernel`
int n_particles = 8192;
std::vector<float> x{n_particles * 2};

for (int i = 0; i < 50; i++) {
  substep_kernel->launch(&host_ctx);
}
vulkan_runtime->synchronize();

auto x_field = vk_module->get_field("x");
if (!x_field) {
  printf("Cannot find 'x' field\n");
  return -1;
}
// device to host copy, size is stored in `x_field` already
x_field.copy_to(/*dst=*/x.get());

In order to achieve this, we need a few new APIs and refactors.

`taichi::aot::Field`

This will take care of copying data from root buffer to some memory (either host or device) owned by the users.
TODO: If we move Taichi field's implementation into C++, we should aim to reuse that in the AOT library as well.

`taichi::Ndarray`

Users should be able to use Ndarray in either JIT or AOT.

`AotModuleLoader` --> `aot::Module::load()`

IMHO, module provides more explicit semantics. E.g.:

namespace aot {

class Module {
 public:
  virtual ~Module() = default;
  static std::unique_ptr<Module> load(const std::string &path,
                                      Arch arch,
                                      std::any mod_params);
  // Module metadata
  Arch arch() const;
  uint64_t version() const;
  // APIs to be overriden by each backend.
  // If failed to load, these APIs return nullptr.
  virtual std::unique_ptr<Field> get_field(const std::string &name) = 0;
  virtual std::unique_ptr<Kernel> get_kernel(const std::string &name) = 0;
};

}  // namespace aot

Expose `Kernel::LaunchContextBuilder`

Right now we have a LaunchContextBuilder in

taichi/taichi/program/kernel.h

Lines 24 to 66 in 8373b83

    
           class LaunchContextBuilder { 
        
            public: 
        
             LaunchContextBuilder(Kernel *kernel, RuntimeContext *ctx); 
        
             explicit LaunchContextBuilder(Kernel *kernel); 
        
             LaunchContextBuilder(LaunchContextBuilder &&) = default; 
        
             LaunchContextBuilder &operator=(LaunchContextBuilder &&) = default; 
        
             LaunchContextBuilder(const LaunchContextBuilder &) = delete; 
        
             LaunchContextBuilder &operator=(const LaunchContextBuilder &) = delete; 
        
             void set_arg_float(int arg_id, float64 d); 
        
             void set_arg_int(int arg_id, int64 d); 
        
             void set_extra_arg_int(int i, int j, int32 d); 
        
             void set_arg_external_array(int arg_id, 
        
                                         uintptr_t ptr, 
        
                                         uint64 size, 
        
                                         bool is_device_allocation); 
        
             void set_arg_external_array_with_shape(int arg_id, 
        
                                                    uintptr_t ptr, 
        
                                                    uint64 size, 
        
                                                    const std::vector<int64> &shape); 
        
             void set_arg_ndarray(int arg_id, const Ndarray &arr); 
        
             // Sets the |arg_id|-th arg in the context to the bits stored in |d|. 
        
             // This ignores the underlying kernel's |arg_id|-th arg type. 
        
             void set_arg_raw(int arg_id, uint64 d); 
        
             RuntimeContext &get_context(); 
        
            private: 
        
             Kernel *kernel_; 
        
             std::unique_ptr<RuntimeContext> owned_ctx_; 
        
             // |ctx_| *almost* always points to |owned_ctx_|. However, it is possible 
        
             // that the caller passes a RuntimeContext pointer externally. In that case, 
        
             // |owned_ctx_| will be nullptr. 
        
             // Invariant: |ctx_| will never be nullptr. 
        
             RuntimeContext *ctx_; 
        
           };

. By exposing this, it will make AOT users a lot easier to construct the RuntimeContext.

Problems

As seen from the sample usage, users still have to manually call synchronize(). Ideally, this information can be encoded inside aot::Kernel::launch().
We lack a way to configure Kernel's grid/dim settings. This will be particularly important for kernels iterating over sparse fields.
To support merging multiple modules (of the same arch), we might need to invent the concept of namespaces within a module. So a kernel or a field belongs to a specific namespace.
Should get rid of the heavy boilerplate, like setting up CompileConfig, MemoryPool, etc.
Unify the AOT data structure for different backends.

Compile-time API

Much like what we are doing now. One tweak is that we should really, really group the files into a single package file, instead of a folder.

Device Capabilities

There should be a way for us to control exactly which API extensions we want to enable when running the codegen. This is not a Vulkan specific problem, and applies to OpenGL, Apple Metal as well.

Build the Runtime Library

We can draw ideas from TVM around https://github.com/apache/tvm/blob/2e32f36fecaa3d5025705a98594a9f4a4f6d9f74/CMakeLists.txt#L401-L406.

There will be a libtaichi_runtime.so, which include all the runtime stuff, including AOT. Then the current libtaichi_core.so becomes (libtaichi_runtime.so + many codegens + CHI IR infra + pybind + ...).

ghuau-innopeak · 2022-03-01T04:19:03Z

@k-ye That looks awesome! Wow! I like a lot the different ideas and changes to the AOT API! :)
For synchronize(), how would it be possible to encode it inside the aot::Kernel::launch()? I'm thinking about use cases where we don't want to synchronize the data, like calling the same kernel multiple times. do we want to add a parameter to launch()?

k-ye · 2022-03-01T06:29:05Z

For synchronize(), how would it be possible to encode it inside the aot::Kernel::launch()?

We have some basic heuristics to decide which Taichi kernels need to synchronize. For example, if a kernel takes in an Ndarray, because we don't yet know whether it writes to the Ndarray, we will call synchronize. Another case is where the kernel returns a value. We can encode such info into the JSON as well.

But I feel like manually controlling when to sync or not is good enough?

KishinZW · 2023-01-22T14:34:44Z

When can we have a complete C++ API reference?
Also I don't see how to get a kernel's return value or show images in C API reference

k-ye · 2023-01-23T01:25:50Z

When can we have a complete C++ API reference?

Hi @KishinZW , wonder if you have checked out https://liong.work/taichi-aot-by-examples.github.io/?

We have just released an initial version of C API, and haven't officially supported C++'s yet.

Also I don't see how to get a kernel's return value or show images in C API reference

In general, Taichi's AOT API recommends that you exhcange data with the kernels via Ndarray:

Showing images is done via Taichi's GUI component. However, Taichi's C API only focuses on the core concepts like kernels and data containers.

That said, the Taichi AOT examples do come with a demo framework that comes with a renderer. You can start from these materials:

k-ye added feature request Suggest an idea on this project discussion Welcome discussion! labels Nov 29, 2021

ailzhang added the RFC label Nov 29, 2021

k-ye changed the title ~~Feature Proposal: Taichi's Ahead of Time (AOT) Module~~ [RFC] Taichi's Ahead of Time (AOT) Module Nov 29, 2021

k-ye mentioned this issue Nov 30, 2021

[Vulkan] Add initial support of AOT #3647

Merged

ailzhang mentioned this issue Dec 2, 2021

[RFC] Cross compile taichi_core.so to other targets #3679

Open

ghuau-innopeak mentioned this issue Dec 9, 2021

[Vulkan] Update AOT Loader support to new API #3766

Merged

ailzhang mentioned this issue Dec 15, 2021

[RFC] Support ndarray for vulkan backend #3807

Closed

3 tasks

ghuau-innopeak mentioned this issue Dec 21, 2021

[Test][aot] Add initial AOT CPP test #3850

Merged

ghuau-innopeak mentioned this issue Dec 28, 2021

[Test] [aot] Add initial AOT CPP test (#3850) #3899

Merged

k-ye mentioned this issue Jan 25, 2022

[spirv] Move external arrays into seperate buffers #4121

Merged

This was referenced Feb 23, 2022

[refactor] Move arch files #4373

Merged

[refactor] Move aot_module files #4374

Merged

[vulkan] Refactor Runtime to decouple the SNodeTree part #4380

Merged

How to port taichi python program to Android platfrom #4376

Open

ghuau-innopeak mentioned this issue Feb 24, 2022

[aot] [vulkan] Output shapes/dims to AOT exported module #4382

Merged

This was referenced Feb 25, 2022

[aot] [vulkan] Add AotKernel and its Vulkan impl #4387

Merged

AOT at the CHI IR level #4386

Open

ghuau-innopeak mentioned this issue Feb 25, 2022

[vulkan] [aot] Move add_root_buffer to public members #4396

Merged

ailzhang mentioned this issue Mar 1, 2022

[vulkan] Support templated kernel in aot module #4417

Merged

qiao-bo mentioned this issue Mar 1, 2022

[vulkan] [aot] Add aot namespace Vulkan #4419

Merged

k-ye mentioned this issue Mar 1, 2022

[metal] Add AotModuleLoader #4423

Merged

This was referenced Mar 3, 2022

[aot] [refactor] Refactor AOT runtime API to use module #4437

Merged

[aot] [refactor] Refactor AOT field API for Vulkan #4490

Merged

k-ye mentioned this issue Mar 17, 2022

[aot] Add KernelTemplate interface #4558

Merged

qiao-bo mentioned this issue Mar 17, 2022

[aot] [refactor] Add make_new_field for Metal #4559

Merged

k-ye mentioned this issue Mar 17, 2022

Consider unifying the LLVM offline cache and the AOT infrastructure #4565

Closed

ailzhang mentioned this issue Mar 21, 2022

[AOT] Support built in taichi kernels in AOT #4585

Open

k-ye mentioned this issue Mar 24, 2022

Ahead-of-time (AOT) Feature Roadmap #4615

Open

ghuau-innopeak mentioned this issue Apr 1, 2022

[wip] [aot] Add support of loading module examples #4705

Closed

This was referenced Apr 6, 2022

[metal] Tweak Device to support Ndarray #4721

Merged

[Aot] Support template args in AOT module add_kernel #4748

Merged

k-ye mentioned this issue May 19, 2022

[Metal] Support Ndarray #4720

Merged

k-ye mentioned this issue Jun 21, 2022

[lang] Export a few types from the share library #5220

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Taichi's Ahead of Time (AOT) Module #3642

[RFC] Taichi's Ahead of Time (AOT) Module #3642

k-ye commented Nov 29, 2021 •

edited

Loading

ghuau-innopeak commented Nov 29, 2021 •

edited by k-ye

Loading

yuanming-hu commented Nov 30, 2021

k-ye commented Nov 30, 2021

k-ye commented Dec 25, 2021

k-ye commented Feb 23, 2022 •

edited

Loading

k-ye commented Feb 25, 2022

k-ye commented Mar 1, 2022 •

edited by yuanming-hu

Loading

ghuau-innopeak commented Mar 1, 2022

k-ye commented Mar 1, 2022

KishinZW commented Jan 22, 2023

k-ye commented Jan 23, 2023 •

edited

Loading

[RFC] Taichi's Ahead of Time (AOT) Module #3642

[RFC] Taichi's Ahead of Time (AOT) Module #3642

Comments

k-ye commented Nov 29, 2021 • edited Loading

Goals

API Proposal

Taichi kernel template

Ndarray: making data containers more flexible

Implementaion Roadmap

Q & A

ghuau-innopeak commented Nov 29, 2021 • edited by k-ye Loading

yuanming-hu commented Nov 30, 2021

k-ye commented Nov 30, 2021

k-ye commented Dec 25, 2021

k-ye commented Feb 23, 2022 • edited Loading

k-ye commented Feb 25, 2022

k-ye commented Mar 1, 2022 • edited by yuanming-hu Loading

What We Have So Far

Runtime API

Example usage

taichi::aot::Field

taichi::Ndarray

AotModuleLoader --> aot::Module::load()

Expose Kernel::LaunchContextBuilder

Problems

Compile-time API

Device Capabilities

Build the Runtime Library

ghuau-innopeak commented Mar 1, 2022

k-ye commented Mar 1, 2022

KishinZW commented Jan 22, 2023

k-ye commented Jan 23, 2023 • edited Loading

k-ye commented Nov 29, 2021 •

edited

Loading

ghuau-innopeak commented Nov 29, 2021 •

edited by k-ye

Loading

k-ye commented Feb 23, 2022 •

edited

Loading

k-ye commented Mar 1, 2022 •

edited by yuanming-hu

Loading

`taichi::aot::Field`

`taichi::Ndarray`

`AotModuleLoader` --> `aot::Module::load()`

Expose `Kernel::LaunchContextBuilder`

k-ye commented Jan 23, 2023 •

edited

Loading