-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace pybind11 #6395
Comments
FYI, is it possible to quickly try nanobind as suggested in #4830 (comment)? |
Thanks. @k-ye
But the nanobind only support Python3.8+? And I'm thinking about whether we should standardize the APIs to be used to build new frontends, which require the stable C-APIs. If we have the C-APIs, we can bind them to Python by using ctypes. Of course, if using the C-API, we can't export a C++ class as a Python class conveniently. |
https://github.com/bobcao3/taichi/blob/dart-native/c_api/include/taichi/frontend_ir.h Here was a half-made attempt to build a Taichi compiler C-API used to bind to Dart (and because it's CAPI it should be able to go everywhere) |
Awesome work! BTW, I want to bind Taichi to my programming language that I have been developing in my spare time, but we don't have standard and stable APIs to build Taichi AST😂. |
I think a critical part of experience is to reduce launch overhead. The API surface for launching kernels is quite a bit smaller than the AST APIs. Starting from there could be easier? |
Thanks for your suggestion!
Agree.
Yes, starting from small part of APIs is easier.
My preliminary thought is similar with @ailzhang's |
I extended @ailzhang's The result is....🤔 pybind took 1.4078617095947266e-06s
ctypes took 6.830692291259766e-07s
cpython took 4.100799560546875e-07s
nanobind took 6.326436996459961e-06s P.S. Test env: macOS, M1 |
Note on the benchmark data: don't just test functions that take a simple The JIT AST construction overhead analysis above looks good. IIRC The pybind11 overhead mainly comes from RTTI ( I guess using a simple C API can significantly reduce the overhead already since the calling mechanism becomes much easier compared to C++. A while (2 years?) ago I wrote a simple test script: taichi/misc/demo_launch_overhead.py Lines 1 to 19 in f573176
On my end (M1 Mac) such kernel tasks 8e-6s, more than 5x overhead from @PGZXB's result above. It is also worth looking into what else contributes to the 8e-6s launching overhead. |
(Related
issue
: #4830)Motivation
Because of the high overhead of pybind11 (see ailzhang/c_ext_for_py), using a more efficient method to export the Taichi core APIs to Python is necessary.
Preliminary Solution
pybind11
withctypes
orcpython
(or others) for HOT Taichi core APIs first, e.g.,make_const_expr_int
,expr_*
and so on.TODO
THINKING...
Performance
...
Appendix
Counting API calls during run examples or tests (very ugly charts; ignore
get_max_num_indices
andpop_python_print_buffer
):Source code: PGZXB:dev-profile-ticore-APIs
The text was updated successfully, but these errors were encountered: