Replace pybind11 #6395

PGZXB · 2022-10-20T09:33:52Z

(Related issue: #4830)

Motivation

Because of the high overhead of pybind11 (see ailzhang/c_ext_for_py), using a more efficient method to export the Taichi core APIs to Python is necessary.

Preliminary Solution

Replace pybind11 with ctypes or cpython (or others) for HOT Taichi core APIs first, e.g., make_const_expr_int, expr_* and so on.
THINKING...

TODO

THINKING...

Performance

...

Appendix

Counting API calls during run examples or tests (very ugly charts; ignore get_max_num_indices and pop_python_print_buffer):
Source code: PGZXB:dev-profile-ticore-APIs

The text was updated successfully, but these errors were encountered:

k-ye · 2022-10-20T10:13:09Z

FYI, is it possible to quickly try nanobind as suggested in #4830 (comment)?

PGZXB · 2022-10-20T11:16:29Z

FYI, is it possible to quickly try nanobind as suggested in #4830 (comment)?

Thanks. @k-ye

Python 3.8+: nanobind heavily relies on PEP 590 vector calls that were introduced in version 3.8.

But the nanobind only support Python3.8+?

And I'm thinking about whether we should standardize the APIs to be used to build new frontends, which require the stable C-APIs. If we have the C-APIs, we can bind them to Python by using ctypes. Of course, if using the C-API, we can't export a C++ class as a Python class conveniently.

bobcao3 · 2022-10-21T01:06:14Z

https://github.com/bobcao3/taichi/blob/dart-native/c_api/include/taichi/frontend_ir.h

Here was a half-made attempt to build a Taichi compiler C-API used to bind to Dart (and because it's CAPI it should be able to go everywhere)

PGZXB · 2022-10-21T02:36:03Z

https://github.com/bobcao3/taichi/blob/dart-native/c_api/include/taichi/frontend_ir.h

Here was a half-made attempt to build a Taichi compiler C-API used to bind to Dart (and because it's CAPI it should be able to go everywhere)

Awesome work! BTW, I want to bind Taichi to my programming language that I have been developing in my spare time, but we don't have standard and stable APIs to build Taichi AST😂.

bobcao3 · 2022-10-21T03:36:14Z

I think a critical part of experience is to reduce launch overhead. The API surface for launching kernels is quite a bit smaller than the AST APIs. Starting from there could be easier?

PGZXB · 2022-10-21T03:55:43Z

Thanks for your suggestion!

I think a critical part of experience is to reduce launch overhead.

Agree.

The API surface for launching kernels is quite a bit smaller than the AST APIs. Starting from there could be easier?

Yes, starting from small part of APIs is easier.

As a result I'd view this issue more to identify hotspots in py->c interaction and migrate them to cpython/ctypes step by step in a measurable way. We can probably employ cpython/ctypes in the critical parts for perf gain and keep some components in pybind11 to enjoy the ready-to-use C++ features. -- #4830 (comment)

My preliminary thought is similar with @ailzhang's

PGZXB · 2022-10-21T11:11:49Z

I extended @ailzhang's c_ext_for_py to test nanobind (source code: PGZXB:c_ext_for_py).

The result is....🤔

pybind took 1.4078617095947266e-06s
ctypes took 6.830692291259766e-07s
cpython took 4.100799560546875e-07s
nanobind took 6.326436996459961e-06s

P.S. Test env: macOS, M1

yuanming-hu · 2022-10-24T02:20:33Z

Note on the benchmark data: don't just test functions that take a simple std::vector<int> as input :-)

The JIT AST construction overhead analysis above looks good.

IIRC The pybind11 overhead mainly comes from RTTI (isinstance in pybind11 can take ~50 us). Such overhead may come from constructing the AST on JITing and launching (testing argument types & casting to the Taichi kernel argument list). I don't remember whether the AST construction part involves smart pointers - if that is the case we'd better testing the libs against these cases with C++ types. (For ctypes/cython we can just use raw pointers, which will likely be faster.)

I guess using a simple C API can significantly reduce the overhead already since the calling mechanism becomes much easier compared to C++.

A while (2 years?) ago I wrote a simple test script:

taichi/misc/demo_launch_overhead.py

Lines 1 to 19 in f573176

    
           import time 
        
           import taichi as ti 
        
           ti.init() 
        
           @ti.kernel 
        
           def compute_div(a: ti.i32): 
        
               pass 
        
           compute_div(0) 
        
           print("starting...") 
        
           t = time.time() 
        
           for i in range(100000): 
        
               compute_div(0) 
        
           print((time.time() - t) * 10, 'us') 
        
           exit(0)

On my end (M1 Mac) such kernel tasks 8e-6s, more than 5x overhead from @PGZXB's result above. It is also worth looking into what else contributes to the 8e-6s launching overhead.

PGZXB added the discussion Welcome discussion! label Oct 20, 2022

PGZXB self-assigned this Oct 20, 2022

PGZXB mentioned this issue Dec 28, 2022

Refactor kernel compilation #7002

Closed

36 tasks

PGZXB changed the title ~~Replace pybind11 (Tracer)~~ Replace pybind11 Apr 25, 2023

PGZXB closed this as completed Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace pybind11 #6395

Replace pybind11 #6395

PGZXB commented Oct 20, 2022 •

edited

Loading

k-ye commented Oct 20, 2022

PGZXB commented Oct 20, 2022 •

edited

Loading

bobcao3 commented Oct 21, 2022

PGZXB commented Oct 21, 2022 •

edited

Loading

bobcao3 commented Oct 21, 2022

PGZXB commented Oct 21, 2022

PGZXB commented Oct 21, 2022 •

edited

Loading

yuanming-hu commented Oct 24, 2022 •

edited

Loading

Replace pybind11 #6395

Replace pybind11 #6395

Comments

PGZXB commented Oct 20, 2022 • edited Loading

Motivation

Preliminary Solution

TODO

Performance

Appendix

k-ye commented Oct 20, 2022

PGZXB commented Oct 20, 2022 • edited Loading

bobcao3 commented Oct 21, 2022

PGZXB commented Oct 21, 2022 • edited Loading

bobcao3 commented Oct 21, 2022

PGZXB commented Oct 21, 2022

PGZXB commented Oct 21, 2022 • edited Loading

yuanming-hu commented Oct 24, 2022 • edited Loading

PGZXB commented Oct 20, 2022 •

edited

Loading

PGZXB commented Oct 20, 2022 •

edited

Loading

PGZXB commented Oct 21, 2022 •

edited

Loading

PGZXB commented Oct 21, 2022 •

edited

Loading

yuanming-hu commented Oct 24, 2022 •

edited

Loading