FYI, I polished the benchmarks #20
Replies: 4 comments 1 reply
-
Neat, thanks for this! Many performance improvements made in this project should be portable to |
Beta Was this translation helpful? Give feedback.
-
By the way, what do you mean by "caching the type look-up into a static variable is a very large optimization which should be done first"? |
Beta Was this translation helpful? Give feedback.
-
Currently, to cast from a C++ type to a Python type, the look-up is dynamic and done each time. There is also another look-up to know whether we have already bound a given object, to not create a new one (this should likely be opt-in but it's another question). When we convert an std::vector into a Python list, the unordered_map mapping from the I had a change in a local client at some point, here is what I could find from some demo code, (it's a demo, it does not deal with all cases, I haven't pushed it to pybind11 internals):
This was several times faster. In my mind, there is no need for dynamic look-up; types are registered once and for-all at start-up and then unchanged. I don't know if it's precise enough and whether you see what I am trying to say. Maybe I missed some reason why we cannot store the result of the look-up in a static variable. |
Beta Was this translation helpful? Give feedback.
-
In practice, what we did was to use the C API, because the performance of pybind11 was too low: https://github.com/tensorflow/tensorflow/blob/e70e3196b7f130b6c5bd9e73a8411a41869b5770/tensorflow/compiler/xla/python/py_buffer.cc#L458-L489 and we have IsInstance and co here As it's within large systems, Tensorflow, JAX, and plenty of others, which are inter-operable, we cannot really change easily how wrappers are done. So even though changing pybind11 is harder, it may be a more realistic approach than breaking interfaces that people are using. So I sympatize with the idea, but it's unclear to me what's the best approach. I am trying to get closer to pybind11 governance and introduce some changes there. I think that we can get very good performance (I don't remember the exact numbers, but with the pybind11 object data layout, I had very close performance, for casting back and forth). It's hard to know when to start from scratch versus improving things in place, but I am a believer that we (all the computer science community) start from scratch probably too much compared to what's optimal. Thanks for the pointers and the discussion! I hope to get more time soon to keep improving this benchmarks. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I think good benchmarks are key to any projects, and were seriously lacking for pybind11.
I improved a little and moved the benchmarks you have done in https://github.com/jblespiau/pybind11_benchmarks so one can just clone and run them (the benchmarks you had were assuming OSX, and you had not explained how you had install things around).
I also have other benchmarks on my computer using the raw C++ Python API, and I plan to add them, so we can compare pybind11 & nanobind wrappers with these versions.
It's mostly for FYI, in case you would like to contribute to it whenever you improve your benchmarks
Beta Was this translation helpful? Give feedback.
All reactions